Saturday, 20 January 2018

52 Ancestors in 52 Weeks: Week 2 - Favourite Photo

(I think I'm going to be interpreting the week numbers loosely in this series!)

Week 2's prompt was "Favorite Photo." -  Tell the story of the people, place, and event in a favourite photo. Where did the photo come from? Who has the original now? How did you get a copy of it?

It's difficult to choose one favourite photograph, amongst all the thousands I have but, in keeping with my aim of working through my ancestors systematically this year, I've chosen this one:

These are my paternal grandparents, Donald PARRY and Elsie THOMAS.  The photograph was taken at a party to celebrate a special wedding anniversary.  It's the first in a series of photographs that gradually expand through the family, to include their children, children's spouses, grandchildren and then, finally, everyone who was at the party.  Including the piano player who, according to a story told to me years later, just happened to be someone in the pub at the time who was able to play the piano!

I don't think there is one 'original' print of the photograph - all of the main family members had copies.

In this post, I'm focusing on my grandfather, who was born on 3rd February 1904, in Mordiford, Herefordshire, the son of John PARRY and Rosina Louisa, formerly PREECE.  Donald was baptised on 17 April 1904 - I was rather surprised when I first found that baptism entry on the British Vital Records cds, as it was less than 100 years old at the time.  But thanks to that surprise, which led me to enthusiastically show the entry to my mother, and then randomly decide to search for one of her "brick wall" ancestors, we solved that brick wall of hers!  (I'll save the details of who that was, for when I tell that ancestor's story.)

Donald had one sister, Rosina Jane, who was born in Hereford on 5th April 1905.  Sadly, their mother, Rosina Louisa, passed away sixteen days after giving birth to her daughter.

One can only imagine what Donald's early life must have been like, having lost his mother so young.  Or how their father coped with the two infants, whilst also trying to earn a living.  I suspect other members of the family may have helped out, but actual evidence for what happened is in short supply.  Perhaps there were 'non-family' carers involved - there has to be some explanation for the anomalies I found in the local school records.  On the 3rd February 1908, Donald's was admitted into All Saints Infants School but the school registers show his father's name as Donald Martin Parry, rather than as John, and Donald himself is recorded as Donald in one register and Albert Donald in another.

Although his sister was over a year younger than Donald, she is admitted to the infants school just three months after Donald, in May 1908.  Was this as a result of a difficult home situation?

There are errors in her school admission entry, as well, with the father's name recorded as Donald Martin again and her birth date entered as the 19th April, rather than the 5th.  She is also named Jane, rather than Rosina - but I do know she was called 'Joan' throughout her later life, so potentially this is not an error, but the name used for her from infancy.

One of my aims of following the 52ancestors series is to help me organise and record the information I currently have on my family.  But it also serves to indicate where more research is required. And clearly there is a need to investigate these errors in the school records further, if possible, so I have added an item to my Research Log, to look for any school log books which might reference the family and clarify the details on the admission registers.

I don't have any major concerns about the errors though, as the address for the entries, 104 Widemarsh Street, ties in with where Donald and 'Jane' appear in the 1911 census, with their names as per their birth certificates:

However, the 1911 census provided me with a mystery, which you might have noticed - where is their father, John PARRY?

He wasn't actually difficult to find.  What was more difficult was identifying why he was where he was - since he was in Hereford Gaol!

I'll save the full story (or as much as I currently know of it) for when I write about John.  But, just to avoid too much suspense, he was jailed on the 24 March 1911, by the Sheriff's Court, for 'Contempt of Court'.  And he wasn't released until the 9th December 1911.

Donald remained at the infants school until March 1912, when he transferred to the 'senior school'. Since he was only aged 8, I assume the 'senior school' was what we would now call a Junior School.  It was possibly "St Owens Council School" as, in 1914, Donald was awarded a prize from there, for regular attendance:

The book was Treasure Island and Kidnapped, by Robert Louis Stevenson (and purchased from C.E.Brunwell, Bookseller, Broad Street, Hereford, according to a little sticker in the back). I wonder if these tales of adventures inspired Donald, who later set off on travels of his own.

I currently don't have any further information about the family until 1918 - in November of that year, Donald and Rosina's father, John, passed away, aged 53.  It must have been a very difficult year for them, as their grandfather, Thomas PARRY, had also died in the February.  I don't know what level of contact Donald had, had with his grandfather, but it is possible that Thomas had been living with John and his family in the months prior to Thomas's death, although he actually died elsewhere. (This is based on the address given on Thomas's probate entry.)

What happened to Donald and Rosina after their father, John, died?

As far as I can gather, Rosina was looked after by an aunt and Donald was sent to stay with (and potentially work for) other relatives.  Almost five years later, in September 1923, Donald obtained the Grant of Administration for his father, whose effects were £30 14s 3d.  Then, in April 1924, he emigrated to Canada, in the company of a Rowland Thomas LEWIS.  We believe the two of them were on an agricultural scheme, but the arrangements are unclear - according to the Form 30a, Donald had paid his own fare across.

I haven't yet discovered when Donald returned to the UK, but he was here by February 1927, when he married Elsie May THOMAS.  Donald and Elsie lived for a while in Hanbury, Worcestershire, where they appear in the 1939 register. Later they moved to Herefordshire and then down to Cornwall, before eventually moving back to Herefordshire, and finally Worcestershire.

Donald and Elsie were able to come to my wedding.  We saw them in the months following that, when they chose some of the wedding photographs that they wished to have copies of. Sadly, Donald passed away before we had a chance to give them the copies.  I remember leaving the photographs with one of my aunts, after Donald's funeral, for her to give to Elsie at a more appropriate time.

It was one of their last 'days out' together.

Writing this has made me aware of how much information there is about my closest ancestors that still needs to be compiled properly - including any recollections of Donald and Elsie that those of us still alive might have.  The further back we go, the less detail we are likely to find out about our ancestors - so I think it's important that we record as much as we can about those we did know, and pass that information on to the generations to come.

Monday, 15 January 2018

Another potentially identified DNA connection

Isn't it nice when things just work out?

I haven't done much regarding DNA over the past month or so, due to other activities.  But I have tried to keep up with the "new" events, such as the MyHeritage changes.  I'll write more about my results at that site at another time - this post is about an Ancestry find.

Late last night, (probably too late, I should have been on my way to bed, but you know that thought, "I'll just check one more thing" 🙂) I decided to look at how many '4th cousin and closer' matches I have on Ancestry.  I thought it would probably be 81, which is what it went up to a week ago. But the numbers have been increasing more rapidly recently, with five new matches in that category since the beginning of the year, so I am ever hopeful of an increase.

The total was 82!

I quickly searched for the new match -  no tree and only a 'good' confidence level, with 22.7 centimorgans shared across 3 DNA segments.  That could mean three segments at about 7.5cM each, or it could be one longer segment and a couple of smaller ones.  I won't know unless they transfer their data to another site.  Still, it would be worth following up when I get time.

But then I looked for any shared matches.  Often there are none, as shared matches only show for matches in the "4th cousin and closer" category so, if this match also matches some of my more distant matches, the more distant ones won't show up on this person's profile.  But, this time, there was one shared match shown, predicted 'high confidence', with 38cM shared across 2 DNA segments.  And with a tree of eleven people.

I keep a running total of the numbers of matches I have, as well as noting the names of new matches and anything interesting about them (like whether they have a family tree, or a surname in common with me). So I could tell that the shared match had appeared on the 9th of January and, at that time, was not showing a family tree.  So I am fortunate in that it looks like they are interested in finding out more about their ancestry, as they have taken the trouble to add some family details.

There were two surnames in common with me, LEWIS in Wales and ALLEN in London.  The Welsh one was not in one of "my" counties, so I took a closer look at the ALLEN first.

There were no dates, just the location for the one female ALLEN's birth in London.  But her marriage was shown, so that gave me her husband's name.  Armed with that information, I was able to identify their marriage, in 1926, on Ancestry.  London records are well represented on the site so I didn't just find the civil registration index but also an image of the actual parish register.  That gave me the bride's father's details, Herbert Henry ALLEN, a poulterer.  As the bride's age was shown on the certificate, it didn't take long to find the family in the 1911 census, Herbert Henry (33), with wife, Ada (32), and children, Edward (12), Florence (11), Herbert Henry (10), Frederick (7), Joseph (6), Dorothy Violet (5), Cyril James (4), Bessie Maud (3) and Frank Reuben (1).  From there I checked the 1901 census, which showed Herbert and Ada, along with the two older children.  Herbert's birthplace was Lambeth in both censuses.  Ada's and the children's varied from Lambeth to Brixton and Stockwell, but these are fairly closely connected areas in south west London, and all familiar from my own family.

The next step was to identify the marriage of Herbert Henry ALLEN to Ada - I used FreeBMD for that and found that the most probable entry was in September 1898, in Camberwell.  Back to Ancestry to search for the church records.  Yes, again the entry was there - Herbert Henry ALLEN, aged 20, married Ada SPRINKS on September 12, 1898.  Herbert's father was a John ALLEN, Perambulator Maker.

Now that's exciting - because my John Prosser ALLEN, snr, was also a perambulator maker. And, on February 10th, 1878, my John, with his wife, Sarah, christened their son, Herbert Henry ALLEN!

Obviously, I need to continue to work through the details, and check for my John and Sarah in records such as the censuses, to make sure their Herbert is with them, or not, as appropriate, and that there's no evidence to suggest this isn't the right connection to my DNA match.  I also need to contact the shared match who appeared on my list yesterday, to confirm whether or not they connect to the same family line.  And, of course, it would be great if both matches transferred their raw data to one of the other DNA sites, so that we can check exactly where we match on the DNA.  That would also mean I could look for more evidence, for or against the connection, amongst my other DNA matches.

ALLEN is a fairly common surname, so I don't follow up general references to it on my DNA matches' surname lists - but, who knows, if these two matches do transfer their data, perhaps there'll be others matching over the same segments and with the same surname.  I'd certainly be following those up then!

Just going back to the quantity of DNA shared - 38cM is the average for 4th cousins (based on Blaine Bettinger's Shared cM Project*) whereas we actually appear to be 3rd cousins.  So the shared DNA is a bit on the low side, but well within the range.  The match with 22.7cM could be more distant, but I am hopeful that they will still be within the range of my genealogy!

(And I did eventually get to bed last night - although it was 'today' rather than yesterday!)

Blaine Bettinger's Shared cM Project - https://thegeneticgenealogist.com/
Interactive Tool by Jonny Perl - https://dnapainter.com/tools/sharedcm

Friday, 5 January 2018

52 ancestors in 52 weeks - Week 1 - Start

"Let's start at the very beginning..."
Perhaps I am being a little unimaginative, but I'm going to take several of Amy's suggested starting points and make my first "52 ancestors" post about myself.  After all, we're always advised to begin our family history with ourselves, and I am the "Home Person" on my Ancestry public tree.

But then again, I am not actually one of my own ancestors - so I'm also going to include my parents in this post.  Although they are both deceased, it still seems too close to publish much online about them, from a privacy point of view, so covering all three of us at once means I can then move on to the more distant ancestors, knowing I have at least mentioned us all in the series.

For those who don't know, the "52 Ancestors in 52 Weeks" is a series of weekly prompts, produced by Amy Johnson Crow, aimed at helping genealogists share their research about their ancestors.  I did consider taking part in the series some years ago, when I began the Genealogy Do-over. At the time, I'd recently acquired all of my parents' family history records and I was planning to work through them, starting with myself, in order to confirm, and add to, Mum and Dad's research.

Unfortunately, I got "bogged down" after about week 6 of the Do-over, and never started the actual "family history" tasks (although I learnt a lot about tools and techniques during those first few weeks, which definitely came in handy for some of the other activities I had going on then!)  You can find my 'Do-Over' posts earlier in this blog.  My more recent posts here (if you can call them "recent"!) have related to my DNA research.  Once again this is something that I find easy to get bogged down with, as DNA can rapidly become complicated, especially for those of us who are not particularly 'technologically minded' and who have to work hard at understanding what all the various tools can do. 

But one of the things I have learnt, in all the years since first taking a DNA test, is that family history is important!  DNA alone will not produce all the answers.  It needs to be combined with genealogy - so hopefully, this year will be the year when I really feel I demonstrate some proper "genetic genealogy"!

Anyway, back to my "start".

Obviously my parents were present - and they continued to be responsible for many of my "starts" in life. Particular memories for me include my first driving experiences - steering an old Bedford van, whilst sitting on Dad's lap (I couldn't reach the pedals!) and, as soon as I was legally old enough to drive, giving Mum a fright when I turned a corner rather abruptly, after she'd bravely allowed me to drive her car on a disused airfield.  Mum and Dad were both responsible for my enjoyment of gardening - I have many happy memories of visits to garden centres, and certain plants will forever be associated with particular experiences involving my parents.  They were also both responsible for my interest in photography, another of their joint hobbies.  A camera was passed down to me when I was merely six or seven and, again, specific memories are intrinsically linked with the two of them, such as photographing lightning in Singapore, and doing our own developing and printing at home.

Of course, the combination of these two hobbies does have its downside, as I now have thousands of photographs of flowers to deal with!


Mum was creative - I didn't inherit any of her musical skills but I like to think that some of her practical side has rubbed off on me, for general handicrafts and (potentially!) model making.  Dad was also practical but more studious. He was responsible for my interest in archaeology - I remember the two of us watching Mortimer Wheeler on television when I was a teenager.  Dad was also the one who began our family history research, back in the early 1980s. 

And he was the one who first mentioned DNA to me, around the year 2000, asking me if I knew anything about it. By then I had begun researching and was concentrating on our surname of Parry, since that was the one Dad had got stuck on and Mum and Dad were both working on all of the other branches.  One of my regrets is that my response was to say, no, I didn't know about DNA and didn't see how it could be used with a multiple origin surname like ours. 

If only I had asked what had he read and did he want to do it.......

Dad passed away within a year of that conversation.  Fortunately for me, seven years later, when I had finally learnt a bit about DNA, another male relative was willing to take a Y-DNA test.  But what a missed opportunity, to have been in there right in the early days of genetic genealogy.  Who knows what situation my DNA surname research would have been in now if I had acted differently?

But there's no point looking back at what might have been.  I'm what's known as a "RAF BRAT", so am fairly used to moving on without allowing regrets to build up.  And I am so grateful for the wealth of experiences my parents gave me, and for the treasures I still have to explore, within their research, as I begin this year's journey to increase my knowledge about all of my ancestors.

Amy Johnson Crow - https://www.amyjohnsoncrow.com/52-ancestors-in-52-weeks/

Wednesday, 9 August 2017

Shared matches - matches who match both my paternal and maternal lines

This is just a quick post, to show the information I looked at, in order to reply to a question Debbie Kennett asked on the ISOGG DNA-NEWBIE mailing list.  The question was "how many people have double matches in their tree, ie, where a person has a match with both your mother and your father."

Now, I don't have my father tested - he passed away in 2001.  However, I do have all four of his siblings tested, as well as a paternal first cousin of theirs.  So, whilst it's not quite the same, as I know there are still some areas of my chromosomes where none of my Dad's relatives match me, the data should give me a reasonable indication of the overlap between my matches and the two different sides of my family (which, as far as I am aware, are not related to each other).

I had actually noticed this 'matching to both sides' some time ago, when I first started playing about with my FTDNA "in common with" (ICW) data and the Pajek program (which I mentioned in my previous post) just to see what the program did.  I realised then that eleven of my matches seemed to match both sides of my family.  Yesterday, I decided to check the current situation in order to answer Debbie's question.

To do this, I used the DNAGedcom Client app to download my ICW file from FTDNA.  I then extracted all the matches who are in common between me and my six relatives (my mother, Dad's four siblings, and their paternal 1c).  I then used the Pajek program to display the information.  This first image was produced using the options "Energy: Kamada-Kawai: separate components"

The program can display the names associated with each point, but I have obviously removed those for privacy reasons.  It is quite clear that there are two main clusters, with sixteen matches spanning the two groups. I spread those sixteen out manually, to make them more obvious, but it's not very easy to see what is happening within the two groups, so next I tried the options "Energy: Fruchterman Reingold: 3D".  Again, I've straightened out the sixteen matches in the middle and this time allocated reference numbers to them, as well as to my relatives:

In this image, as well as the sixteen matches who link to both the paternal and maternal sides of my family, the clusters of matches for each of my father's relatives are more distinct.

(The same information can also be discovered by using a spreadsheet containing all six of the ICW files combined together and creating a pivot table with match names down the rows, and my relatives as the column headings, the table then showing a count of the match names.  By filtering on all those who match my mother, and gradually working through all those who match one or more of my paternal relatives, the full list of people matching both sides of my family can be obtained.

Doing this in a pivot table has the additional benefit that, once the list of people who match both sides is completed, it can be used to pick out the same people from the chromosome browser (CB) file*, so that the actual nature of the matching segments can be examined.

I've allocated matches to the maternal or paternal sides of my family on the basis of who shares the same segment as the match does to me.  However, in the cases where I share two segments with a match (M11 and M14) the segments are each shared by different sides of my family, so that it appears I connect to those matches through both the paternal and the maternal sides of my family:

It will be interesting to see if those matches turn out to be genuine!

Pajek Quick Reference sheet

Picking out the CB data for the "Both" people
There's probably several ways of doing this but, as I am sure there's other people in the same situation that I am in, having to learn it as I go along (and relearn it every time I want to do something similar!) these are the details of what I did:
Having cut and pasted the list of people identified as matching both sides into the first column of a new spreadsheet in the CB file, I pasted the following formula into an empty cell alongside the first match in the CB spreadsheet (replacing the blue text with the appropriate information):  =VLOOKUP([the cell reference of the Full name column for the first match in the CB spreadsheet],'[the name of the new spreadsheet containing the list of people matching both sides]'!A:A,1,FALSE) , where the A is the column in the new spreadsheet containing the list of names who match both sides, so make sure that list is pasted into the first column labelled A.  I then used 'Fill down' to copy the formula to all the cells in the CB column.  The result is the cells either show #N/A, if that match is not on the "Both" list, or the name of the match, if they are on the "Both" list.  I then used the filter function to show just the rows with the match name in and copied all the CB data for those matches into a new spreadsheet, which I used to create the table where I have allocated the matches to maternal and paternal sides.

Ancestry shared matches and a new connection

This post continues my general theme of looking for strategies to deal with my DNA results - in this case, results from AncestryDNA.

I have 225 pages of matches at Ancestry, which equates to almost 11,250 matches.  I use the DNAGedcom Client app to download the information.  That gives me three files - a list of my matches, a file showing which of the matches are in common with each other (based around fourth cousins and closer only), and details from my match's trees.  This latter, 'ancestors', file has over 345,000 lines of data in it, which seems a staggering amount to consider dealing with - especially as, unfortunately, most of it is probably not relevant to my connections with my matches, as the majority of them are in the USA and few have traced their connection back to the UK, which is where most of my pedigree information relates to.

Although I do have three Ancestry Hints, which have been helpful, I don't appear in any 'DNA Circles'.  So I've been looking at the "shared matches", to see what clues I can garner from those. Ancestry provides details of my matches that are fourth cousins and closer, and indicates where they share DNA with another of my close matches.   They do also show the more distant matches that are shared matches to the closer cousins - but only by showing the closer match on the more distant match's profile.  Given how many thousands of distant matches I have, I do not check each of their profiles individually to see if they just happen to match a closer cousin.  So the app download makes this feature more useful, by picking up those more distant matches who are in common with the fourth cousins, as well as providing the information in a more convenient, (ie spreadsheet) format.

I have 59 matches within the '4th cousins or closer' category and 379 rows in the ICW* file downloaded by the Client app, which, as far as I am aware, includes each individual who connects to one of my '4th cousins or closer' matches.  That's probably not many in comparison to people with colonial US ancestry but I imagine it's about average for those of us in the UK.  And it is enough to do some simple 'network analysis', which I hope might allow me to make more sense of the data.

Let me say here that I don't really know anything about proper network analysis - I think that's complicated computing, with thousand of entries, which produces things like the Genetic Communities.  It involves lots of statistical calculations and terms that I don't even understand the meaning of, yet alone know how to use! But most of us are probably capable of using some simple techniques - the basic concept for what I am doing I learnt when studying for a GCSE in psychology, so that's a qualification designed for teenagers. In that course, we were using it to analyse friendship patterns in a class of schoolchildren.  The "sociometric" technique simply consisted of asking each child in a class who their three best friends in the class were.  One then drew a diagram something like the following, where each dot is a person and the arrow shows the direction of the 'choice'.

It occurred to me some years ago that this type of diagram could possibly be used to help analyse genealogical networks and I had hoped to use it in my Parry One-Name Study to try to sort out the potential relationships among the lower gentry of Herefordshire (which contains numerous Parry connections that may, or may not, relate to the same Parry family). I came across a (free!) program* that looked like it would be useful for actually drawing the diagram (although it is easy to do by hand, if there's a lot to draw, a computer obviously does make it easier) but I never managed to get all the pedigrees typed up sufficiently to try it out for my study.  Now, with doing genetic genealogy, it seems to me that the same principle could be used with shared matches.

And so the following diagram shows the connections between my shared matches at Ancestry:

In this image, each red dot represents one of my matches, and the blue lines indicate the other matches that they also match.  I am not using arrows, just lines, as the genetic relationships will be in both directions.

As you can see, the matches fall into groups, Sometimes these are made up of just two or three people who are shared matches with each other.  But there's also some larger groups, one of about 50 connections, and the other with over 150 connections.

It was interesting to see how the data plotted, but how does this help me?

Well, my theory, as you've possibly guessed by now, is that the people in the same group are likely to connect to me (at some level) through the same ancestral line.

So, firstly, I allocated everyone in each group an 'AncestryICW Group Number' (both in the Notes section of my view of their DNA profile on Ancestry and in my spreadsheet) to help me keep track of the Groups.   I also added any information about potential surname connections.  Here's the same diagram, with those numbers added and also some additional symbols based on my family history. (Key in the bottom right corner of image)

As you can see, the Group 1 (derived just from the genetic relationships provided by Ancestry), contains two people who share the surname NAYLOR with me. One of these I have discovered the potential connection to, the other currently just has the surname in common with me.

I've also 'starred' one match - over the weekend, I carried out a new download of the shared matches file. There were 32 new rows added since the previous download, which, once charted, increased the size of some of the existing groups and also created a few new ones.  (NB these are not new 'fourth cousins or closer' - these are more distantly related new matches, who just happen to connect to my fourth cousins and closer.  As such, I would not normally have checked them out, among the many new distant matches that keep being added.)

I was just starting to work through them, adding the group numbers to my spreadsheet and checking if the people had trees attached to their account, when I noticed the surname NAYLOR.  Yes, one of the new additional matches in Group 1 also had a NAYLOR in their tree!  It was just one, a NAYLOR female marrying into their SMITH family, with no other information about her except her husband's name, and their child's details.  And the family were in the 'wrong' place in the UK (up in Lancashire, rather than in London) - but obviously I didn't leave it there.

By initially working on the husband of the SMITH child, and then finding him and his wife in the 1939 Register, I was able to obtain her proper birth date (1895, not 1885 as shown on the pedigree). That correction meant that I could then find her in the 1901 and 1911 censuses with her parents - her mother being the NAYLOR by birth. Those censuses gave me sufficient information to get back to the previous generation - who traced back to London and the entries I believe relate to my family in 1841!

All of this still needs confirming properly, especially the early censuses for the family, which I had found some months ago when identifying the other NAYLOR connection, who is in Australia.

But it all looks very promising that my new match and I are fourth cousins through the NAYLOR line.

So, just the process of simply grouping my shared matches, on the basis of who they are in common with, has been sufficient for me to spot a connection that I may not have seen otherwise, since the new match was identified by Ancestry as a more distant 5th-8th cousin, sharing just 9.7cM across 1 DNA segment. Although I understand that there may be other reasons for shared DNA of that quantity, unless I can find other evidence to contradict it, the simplest explanation, that the three matches in Group 1 who all share the NAYLOR surname with me obtained it from a common NAYLOR ancestry, does seem to be logical.

Network analysis program used for drawing chart: Pajek (http://mrvar.fdv.uni-lj.si/pajek/ )  [One day, I hope to learn to use the program properly, as I am sure it could potentially display the DNA information more effectively, taking account of features such as the closeness of relationships etc]

ICW - stands for "in common with" - the term often used for matches who also match someone else you match.

Friday, 4 August 2017

Autosomal DNA Discussions - and some statistics for my kits

There have been some interesting discussions on the mailing lists recently*, which have caused me to look at some statistics for the kits I manage.  On the one hand, there were the, seemingly straightforward, questions concerning the best strategy for dealing with autosomal DNA results, and how to manage the ever increasing influx of new results.  Answers to these questions tend to include the importance of sharing multiple segments and of limiting the minimum length of the segments worked with, as well as focusing on names and locations relevant to one’s own family history.

But, on the other hand, the ongoing debate, predominantly between two people who I regard as genetic genealogy experts, Debbie Kennett and Tim Janzen, shows that things can be far from “straightforward” when dealing with DNA.  Alongside issues of terminology (what do we actually mean when we say “identical by state”, or “identical by descent” etc.), and how far back shared ancestry might be for particular levels of shared DNA (even up to 10 or 20 generations), such discussions often revolve around the problem of “triangulating groups”* (TGs) – what causes them, how relevant they are (or aren't), and the factors that affect them (such as segment size, phasing, haplotype frequency, and the population that’s involved).  

Fundamentally, the problem seems to be that scientific modelling suggests TGs shouldn’t exist, as it’s thought to be “mathematically impossible for so many people to share the same segment by virtue of sharing a single ancestral couple.”* But many people's results seem to indicate that they do exist – so why?

I don’t have the answer to that question, obviously, and I've written before about the two differing theories (at http://notjusttheparrys.blogspot.co.uk/2016/11/dna-update.html)  But two comments in particular struck me, as I realised that I hadn't specifically examined my kits with these issues in mind.  First was Tim’s comment that half identical regions (ie matching segments) that are at least 15 cMs in length and contain at least 2000 SNPs will almost always be "identical by descent" (IBD) and, secondly, Debbie’s comment that, in her experience with UK matches, the only segments that fall into triangulated groups are small segments under 15 cMs, and that we would be better off focusing our attention on matches that share over 15 cMs.

Debbie and I have discussed the numbers of TGs we have before, so I know my results show a few more than hers do, but this has prompted me to take a detailed look at my kits, to see the effect of applying such thresholds.

I began with FTDNA, where I have access to seven UK kits.  The following graph show the numbers of matches I have with particular “longest segment” lengths, annotated for any known relatives:

These graphs shows a group of four siblings and the numbers of matches they each have with particular “longest segment” lengths, annotated for any known relatives:

And finally in this section, graphs for the three other kits I have access to:

The following table summarises how many matches each of the above kits would have to work with, if either a 15cM or a 20cM threshold was applied:

So applying such thresholds would certainly reduce the number of matches regarded as 'relevant' and make working with the results more manageable.

But would we be missing useful information, as demonstrated by the 4c1r matching kit N?

I had hoped to produce similar graphs for the two kits I manage at 23andMe but, as the download file doesn't include details of the "longest segment" for matches sharing multiple segments, the following graphs include all segments for those matches my mother and I are sharing with (or who are "Open Sharing").  (I have removed the parent/child segment data to avoid an 'extended tail' in the graphs.)

The "curves" of the 23andMe graphs are much more irregular than the FTDNA kits, which could be a feature of the differences in the nature of sharing between the two companies.

But, once again, it is clear that applying thresholds of 15cM or 20cM would dramatically reduce the number of segments left to work with.

As a slight sidetrack, in view of another question on a mailing list, concerning the numbers of matches that don't match parents, I just thought I'd add in a graph to show the numbers of N's segments that are from matches identified as also matching N's mother.

As you can see, the number of non-maternal matches is generally greater than the number of maternal, which possibly indicates that there is some level of false positives in the results.  However, it could also just be a sign that N's paternal side of her family has more matches in the databases - something which is supported by the higher numbers of matches N's paternal relatives have at FTDNA in comparison to N and her mother.   But the important issue is that the segment lengths for the matches identified as maternal do go all the way down to 5cM.  It seems to me therefore, that it would not be easy to distinguish which segments may be false positives (ie people identified as matches who are not genuine matches), based just on maternal/paternal matching.  With such short segment lengths, it is possible that the parents' results are showing false negatives (ie genuine matches not identified as matches in the parent for some reason.)

Back to the original questions.  Of course, segment length isn't the only consideration - Tim's criteria included the numbers of SNPs as well.  The following scattergram shows numbers of SNPs per segment length, with the shaded area being those who would meet the criteria of being "at least 15 cMs in length and containing at least 2000 SNPs".

There are 165 segments (out of 1920) that would meet Tim's criteria to be genuine 100% of the time (and 32 segments, if the segment length used was 20cM.)

The 23andMe graphs for N shows an unexpected peak at 27cM, which the scattergram indicates is made up of some segments with less than 2000 SNPs.  Closer analysis shows these are predominantly at the start of chromosome 15 and, using the ADSA tool*, it can be seen that all but one of the segments fully triangulate as a non-maternal TG.

Is it a genuine segment (ie descended to all the matches from a shared ancestor)?  The low SNP count might imply not, but the apparent phasing and the fact that it is at the start of the chromosome (where recombination is perhaps less likely), as well as it being over 15cM, may be factors in favour of it being so.

But the honest truth is, I currently don't know - with factors both for and against it, I often think the only way to tell if a segment is genealogically relevant is if one finds a genealogical connection!

So, what about any other triangulating groups I might have?  I started by using the more restrictive thresholds of 20cM and 2000 SNPs.  At these levels, my FTDNA kit showed one TG:

However, three of the matches are clearly related to each other, so the TG actually only consists of three separate ancestral lines (theirs, mine and the fourth match's).  When I added my close relatives in, all four of these matches show as paternal matches.  Reducing the threshold to 15cM (but maintaining SNP threshold at 2000) picks up another member of the one family, and reducing it to 10cM picks up one other match (10cM, 2700 SNPs), who triangulates with all of the others.

On two other chromosomes, at 20cM, there is a match who shows as matching my 1c1r so, whilst not creating a TG, these do give me hints as to the relevant ancestral lines there.

In addition to the above TG, reducing the threshold to 15cM produces TGs on ten other chromosomes with my FTDNA kit.  These can be identified as either paternal or maternal based on matching to relatives (who aren't shown, in order to keep the diagrams easy to read):

I do think some of these TGs look "too perfect" - for example, see chromosome 8, where twelve people show identical figures.

Decreasing the thresholds below 15cM, increases the numbers of matches in these TGs, as well as producing more TGs, but many look too regular, given the random nature of DNA transmission. The use of matching to close relatives to 'phase' the segments should indicate a genuine matching sequence on one chromosome out of a pair (rather than a "match" being created from criss-crossing between SNPs on the two chromosomes in a pair).  But I do have a nagging suspicion that something may not be right, when all of the matches over any particular segment seem to be on just one chromosome, rather than there being overlapping maternal and paternal TGs - although, occasionally, that pattern of two overlapping TGs can be found, as in this example from chromosome 4:

Moving on to my 23andMe kit, at 20cM and 2000 SNPs that shows two TGs:

Chromosome 4 (maternal TG)

And on the X chromosome, a paternal TG:

I did think there was a third TG, on chromosome 11:

But, on checking the profiles, I discovered the two matches are identical twins, so that means there's just two ancestral lines involved (mine and theirs), and so this doesn't make a TG.

It is also a timely reminder that DNA results should always be analysed in conjunction with the genealogy!

Rerunning the 23andMe data using thresholds of 15cM and 2000 SNPs produces TGs on an additional eleven chromosomes.  This time, I have included details of how my mother matches, since she's the only close relative at 23andMe, so it doesn't complicate the images too much and does make the phasing more obvious.

So, in my results, I do have some TG's above 15cM and 2000 SNPs.  But I am not convinced that they are all valid, based on what they look like in comparison to what I understand about the random nature of DNA transmission.  I do need to work through the groups, to see if there are any obvious explanations for the anomalies and "overly perfect" matching (as in the case of the identical twins above.)   There are probably some other investigations I could do with the data, for example, checking for runs of homozygosity (sequences of identical SNPs on both chromosomes), which might be affecting matching.

However, I don't think there's much that I, as an individual test taker, can do to find out about how issues such as endogamy, haplotype frequencies, and population segments (which are some of the possible reasons given for why TGs may not be valid), might affect the validity of the TGs appearing in my results.

But trying to test the validity of the comments in the discussions wasn't the point of this post.  My aim was purely to examine my results in the light of those comments, to see what doing so showed, and I feel that carrying out this analysis has been very useful.  It has been helpful to look at ways to make the numbers of matches more manageable and to think about what information I might lose by doing so.   Focusing on these aspects of my results has also caused me to notice things about the data that I had previously missed.  I'm sure the results will also be helpful as I continue to work on the visual phasing and looking at how the segments shared with matches correlate with what might be predicted from that. There are clearly other aspects of my results that I also need to consider, such as matches sharing multiple segments and the company predictions about relationships levels, which I haven't taken account of here.

But, hopefully, all of this combined will enable me to work out some more effective strategies for dealing with my results - which, of course, must include one of the main things I have been reminded of during this process, which is the importance of checking out the genealogy of my matches!

*Discussion References
Corrinne Curtis, Re: [G] Family Finder Kit  (http://archiver.rootsweb.ancestry.com/th/read/GOONS/2017-07/1500365965)

Three discussions on the ISOGG lists (which are not public so I won't post the links - membership of ISOGG is free though, so please join - see https://isogg.org/ ) The discussions are in the threads [ISOGG] Autosomal Survey, [DNA-NEWBIE] Spreadsheets and new matches, and [DNA-NEWBIE] Re: Single Large Segments

Ian Logan, [DNA] Falsely positive matches of Autosomal results (http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2017-08/1501660749)

About Triangulation - https://isogg.org/wiki/Triangulation

ADSA tool - https://dnagedcom.com/adsa/index.php

Tuesday, 11 July 2017

AncestryDNA - Genetic Communities

Back in February, when I wrote about my LivingDNA results, I commented on the upcoming release of AncestryDNA's "Genetic Communities" feature, which I'd heard about through others who could see their communities as part of the beta testing.  Unfortunately, general "busy-ness" got in the way of me posting about my own Genetic Communities, when I received them soon after that.  So this is a 'catch up' post.  I'm not going to cover all the details of how the Genetic Communities work - information about that is already available on the blogs of other genetic genealogists, such as Blaine Bettinger* or Debbie Kennett*, or on the Ancestry site itself. In this post I'm just going to focus on my own results and explore how useful (or otherwise) the information might be.

This is from my AncestryDNA Home Page, showing my general ethnicity and also that I am in three of the genetic communities.

Clicking though to view my "genetic ancestry" gives me the details of which communities I am in, and a map showing both the communities and the estimated general ethnicity areas (I only have traces of 'ancestry' from the "three more regions" so they aren't shown in detail.)

There are over 300 Genetic Communities currently available (Blaine Bettinger has provided a pdf of the full list, from a link on his blog), and it is possible to click down from a continental level, to explore what communities have been identified in different regions of the world, by clicking the "view all" button.  However, I find this a bit inconsistent, and potentially "buggy", when trying to explore the regions where I am in a community.

For example, If I look at the "Scots", which I am not part of, all of the communities show separately in white:

But, when I view a region where I am part of a community, I can only see my own community. For example "The Welsh and English West Midlanders" contains three communities:

But I only seem to get shown the one that I am in, when I try to view these:

This is virtually the same view I get when viewing my own Genetic Community, "English in the West Midlands". 

Based on the list provided by Blaine Bettinger, the "Welsh and English West Midlanders" region also contains the "North Walians" and the "South Walians", but I don't seem able to access the view similar to the one I see for the Scots region, showing all three of the communities in the region - although I can (sometimes) see the whole region, if I access it from the drop down on my own genetic communities view above:

For the other two community regions that I am in, the "English Midlanders and Northerners" and the "Southern English", I seem to be in the overall region but not allocated  to a more specific community within that, but again, the only view I can obtain is the same as my personal view, so I cannot see what the three more refined communities in each of these regions are.

 I would be interested in seeing how the three regions my Genetic Communities are in look like to someone who is not in them.

Comparison to LivingDNA
Since LivingDNA is the only other company that provides ethnicity estimates in fine detail within the UK, I thought it might be interesting to compare the results from them to my Ancestry Genetic Community regions.  My LivingDNA results have been updated since I wrote about them at http://notjusttheparrys.blogspot.co.uk/2017/02/a-slight-sidetrack-my-livingdna-results.html so, for now, I am including an image from both versions of LivingDNA to compare to AncestryDNA's Genetic Communities. (I will do a more detailed post about the updated LivingDNA results later.)

The three Genetic Communities I am in on Ancestry cover a large area of England, but do not include any of Scotland and only cover the border area of Wales.  In some ways, the earlier version of the LivingDNA results was a better match to the Genetic Communities, as it included down into Devon and Cornwall, and did not include much of Scotland, whereas the updated results no longer show any Devon or Cornish DNA, and now include Aberdeenshire.  However, we are talking about fairly low percentages for these counties.  Both Ancestry and LivingDNA place my main 'ancestry' as being from the West Midlands/Welsh Border areas - which does tie in with my known family history.

So I do feel that both companies are identifying connections to similar areas within the UK and, as the details continue to be refined, potentially the results will be very useful in furthering my family history.

Debbie Kennett has pointed out that, given the current predominance of Americans in the database, the Genetic Communities can help those of us in the UK to filter our match lists so as to focus on the more relevant matches, ie those who do have an identifiable connection to the same UK areas that we have.  However, although the Genetic Communities are created initially from the DNA analysis, with pedigrees then being used to supply historical information that helps to 'identify' the community, it isn't necessary to have a pedigree in order to be in a community, so finding the connections to matches who are in communities will usually involve further research (and, ultimately, might still be impossible in some cases). 

But the very fact that a pedigree isn't required, in order to appear in a community, does make the Genetic Communities a useful feature for anyone who does not know their family history, as it can help to identify some "times and places" for them to explore potential connections to their matches.

So, as confirming my family history and discovering new relatives are my main aims in using DNA, how useful are the communities for finding the connections between my matches and my own family history, beyond the general benefit of narrowing down my match lists? 

 The story views on the Genetic Communities help to provide more detail about the places where my matches' ancestors were from.

And also where they went to:

And the connection page indicates some of the surnames that are more prominent in the particular community, as well as indicating my own strength of connection to the Community:

(I love the background photo, by the way - definitely a place with relevance to my family history!)

As you can see, there is overlap between the three communities that I am in.

Just as I am in several communities, so are many of my matches.  The following diagram illustrates the numbers of my matches in each of the overlapping Community groupings:

(For anyone who does the maths, yes, there is an inconsistency between the images, with 23 matches being listed as in the "English in the West Midlands" community, and only 22 shown in my diagram - that's because another person was added in the four days between extracting the community match lists to produce the diagram and then copying the "Your Connection" image above.  Keeping data up to date is not easy!)

Since the "English in the West Midlands" is a subset of the "Welsh and English in the West Midlands", it does seem strange that two of the matches are in the subset but not in the higher level community (but that's just a minor anomaly that I've noticed, rather than something I'm looking into).

It seems clear that, at the moment, whilst it is helpful to know these matches have a UK connection, the Communities don't necessarily narrow that down to a particular branch of my family - partly because my genetic matches and I might both be in the same multiple communities but also because, as Blaine points out in his post, just because a match shares a particular community with me, it doesn't mean that, that is definitely where the shared ancestry is from.  But the Genetic Communities certainly could be helpful 'pointers' to potential connections and I imagine they will also improve over time, so may eventually even hint at specific family lines, especially when combined with other information from known family history and shared matches. 

What about those DNA matches that I have already identified some shared ancestry with - how do the Genetic Communities match up to our shared ancestry? 

Unfortunately, only two of those 'identified matches' appear in the same communities that I am in.  In one case, the match is in three of the communities I am in - the 'Welsh & English West Midlanders', 'English in the West Midlands' and 'English Midlanders and Northerners'.  There is quite an overlap between these three communities anyway, but it is reassuring that our shared ancestry is from around the Bromyard area, in north eastern Herefordshire.  The other match is in both the 'Southern English' and the 'English Midlanders and Northerners'.  In this case, our shared ancestry is in London in the later 1800s and then traces back to Wiltshire by the beginning of that century, so it looks as if the 'Southern English' community may be relevant to this - but, if I didn’t already know the connection, the shared 'English Midlanders and Northerners' could send us looking in the wrong place.

There is one other match who, whilst I don't know exactly how we relate, is known to be related to me on my mother's side, thanks to comparisons at Gedmatch.  They are in both the 'Southern English' and the 'English Midlanders and Northerners', either of which could be relevant to my mother's side of my family.  However, I have noticed that a third match, who is shared between the two of us, is showing as just in the 'Southern English' community, so that may possibly hint at where the shared ancestry is (although that community does take in everything under a line from South Wales to the Wash, so that's hardly narrowing things down :-) )

In another example, I do have a match who is in all four communities that I can see, but is a shared match to someone who is only in one of the four.  So the combination of the Genetic Communities with shared matches may be another topic to explore, to see if it can help indicate the potentially more relevant areas of the country to be researching in. 

However,  this may not be without its problems and may still be misleading to me.  For example, I have a match who shows up in just the 'Southern English' community, but both his profile and a shared match indicate there's likely to be a high level of Welsh ancestry.  Since I assume that I am not seeing any communities that my matches are in, but which I am not in, it's possible that they both share in a Welsh community,  and it's probably more likely that one of my West Midlands ancestors headed into Wales and connects into their trees that way, than the connection being in the south of England.

Shared matches are something I will write about in a separate post soon, so I shall perhaps consider the combined use of these two tools further in that.  I'm certainly grateful to AncestryDNA for the various tools they provide and look forward to future developments.  

I just know that I still have a lot to learn, to be able to work with the tools effectively!