Wednesday, 9 August 2017

Ancestry shared matches and a new connection

This post continues my general theme of looking for strategies to deal with my DNA results - in this case, results from AncestryDNA.

I have 225 pages of matches at Ancestry, which equates to almost 11,250 matches.  I use the DNAGedcom Client app to download the information.  That gives me three files - a list of my matches, a file showing which of the matches are in common with each other (based around fourth cousins and closer only), and details from my match's trees.  This latter, 'ancestors', file has over 345,000 lines of data in it, which seems a staggering amount to consider dealing with - especially as, unfortunately, most of it is probably not relevant to my connections with my matches, as the majority of them are in the USA and few have traced their connection back to the UK, which is where most of my pedigree information relates to.

Although I do have three Ancestry Hints, which have been helpful, I don't appear in any 'DNA Circles'.  So I've been looking at the "shared matches", to see what clues I can garner from those. Ancestry provides details of my matches that are fourth cousins and closer, and indicates where they share DNA with another of my close matches.   They do also show the more distant matches that are shared matches to the closer cousins - but only by showing the closer match on the more distant match's profile.  Given how many thousands of distant matches I have, I do not check each of their profiles individually to see if they just happen to match a closer cousin.  So the app download makes this feature more useful, by picking up those more distant matches who are in common with the fourth cousins, as well as providing the information in a more convenient, (ie spreadsheet) format.

I have 59 matches within the '4th cousins or closer' category and 379 rows in the ICW* file downloaded by the Client app, which, as far as I am aware, includes each individual who connects to one of my '4th cousins or closer' matches.  That's probably not many in comparison to people with colonial US ancestry but I imagine it's about average for those of us in the UK.  And it is enough to do some simple 'network analysis', which I hope might allow me to make more sense of the data.

Let me say here that I don't really know anything about proper network analysis - I think that's complicated computing, with thousand of entries, which produces things like the Genetic Communities.  It involves lots of statistical calculations and terms that I don't even understand the meaning of, yet alone know how to use! But most of us are probably capable of using some simple techniques - the basic concept for what I am doing I learnt when studying for a GCSE in psychology, so that's a qualification designed for teenagers. In that course, we were using it to analyse friendship patterns in a class of schoolchildren.  The "sociometric" technique simply consisted of asking each child in a class who their three best friends in the class were.  One then drew a diagram something like the following, where each dot is a person and the arrow shows the direction of the 'choice'.

It occurred to me some years ago that this type of diagram could possibly be used to help analyse genealogical networks and I had hoped to use it in my Parry One-Name Study to try to sort out the potential relationships among the lower gentry of Herefordshire (which contains numerous Parry connections that may, or may not, relate to the same Parry family). I came across a (free!) program* that looked like it would be useful for actually drawing the diagram (although it is easy to do by hand, if there's a lot to draw, a computer obviously does make it easier) but I never managed to get all the pedigrees typed up sufficiently to try it out for my study.  Now, with doing genetic genealogy, it seems to me that the same principle could be used with shared matches.

And so the following diagram shows the connections between my shared matches at Ancestry:

In this image, each red dot represents one of my matches, and the blue lines indicate the other matches that they also match.  I am not using arrows, just lines, as the genetic relationships will be in both directions.

As you can see, the matches fall into groups, Sometimes these are made up of just two or three people who are shared matches with each other.  But there's also some larger groups, one of about 50 connections, and the other with over 150 connections.

It was interesting to see how the data plotted, but how does this help me?

Well, my theory, as you've possibly guessed by now, is that the people in the same group are likely to connect to me (at some level) through the same ancestral line.

So, firstly, I allocated everyone in each group an 'AncestryICW Group Number' (both in the Notes section of my view of their DNA profile on Ancestry and in my spreadsheet) to help me keep track of the Groups.   I also added any information about potential surname connections.  Here's the same diagram, with those numbers added and also some additional symbols based on my family history. (Key in the bottom right corner of image)

As you can see, the Group 1 (derived just from the genetic relationships provided by Ancestry), contains two people who share the surname NAYLOR with me. One of these I have discovered the potential connection to, the other currently just has the surname in common with me.

I've also 'starred' one match - over the weekend, I carried out a new download of the shared matches file. There were 32 new rows added since the previous download, which, once charted, increased the size of some of the existing groups and also created a few new ones.  (NB these are not new 'fourth cousins or closer' - these are more distantly related new matches, who just happen to connect to my fourth cousins and closer.  As such, I would not normally have checked them out, among the many new distant matches that keep being added.)

I was just starting to work through them, adding the group numbers to my spreadsheet and checking if the people had trees attached to their account, when I noticed the surname NAYLOR.  Yes, one of the new additional matches in Group 1 also had a NAYLOR in their tree!  It was just one, a NAYLOR female marrying into their SMITH family, with no other information about her except her husband's name, and their child's details.  And the family were in the 'wrong' place in the UK (up in Lancashire, rather than in London) - but obviously I didn't leave it there.

By initially working on the husband of the SMITH child, and then finding him and his wife in the 1939 Register, I was able to obtain her proper birth date (1895, not 1885 as shown on the pedigree). That correction meant that I could then find her in the 1901 and 1911 censuses with her parents - her mother being the NAYLOR by birth. Those censuses gave me sufficient information to get back to the previous generation - who traced back to London and the entries I believe relate to my family in 1841!

All of this still needs confirming properly, especially the early censuses for the family, which I had found some months ago when identifying the other NAYLOR connection, who is in Australia.

But it all looks very promising that my new match and I are fourth cousins through the NAYLOR line.

So, just the process of simply grouping my shared matches, on the basis of who they are in common with, has been sufficient for me to spot a connection that I may not have seen otherwise, since the new match was identified by Ancestry as a more distant 5th-8th cousin, sharing just 9.7cM across 1 DNA segment. Although I understand that there may be other reasons for shared DNA of that quantity, unless I can find other evidence to contradict it, the simplest explanation, that the three matches in Group 1 who all share the NAYLOR surname with me obtained it from a common NAYLOR ancestry, does seem to be logical.

Network analysis program used for drawing chart: Pajek (http://mrvar.fdv.uni-lj.si/pajek/ )  [One day, I hope to learn to use the program properly, as I am sure it could potentially display the DNA information more effectively, taking account of features such as the closeness of relationships etc]

ICW - stands for "in common with" - the term often used for matches who also match someone else you match.


  1. Interesting analysis. Obviously a different program but producing similar graphic representations to Shelley Crawford's. Which is easier to use I wonder?

    1. Thanks for your comment, Pauleen. A couple of people mentioned NodeXl and Shelley's articles after I wrote this and I'm currently working through those. At the moment, I'd say Shelley's is easiest, as her instructions are very clear and easy to follow, whereas Pajek has been a "trial and error" process for me and I don't really understand enough about what it can, or can't, do. In Blaine's Facebook group (Genetic Genealogy Tips and Techniques) another program, Cytoscape, has also been mentioned, which might be another alternative. Shelley commented that she's been trying several such programs ("Cytoscape, Gephi and Tulip") and they all have pros and cons, so "easier" may come down to personal choice, based on level of understanding, how many matches you're dealing with, and what you're trying to achieve at the time. :-)