DNA – Shared Matches and Clustering

28
Jul

If you have taken a DNA test, you may have noticed your match list, which shows both close and distant genetic cousins. If known relatives also tested with the same database, you may see them in your list of cousin matches. Most likely, the others listed are complete strangers who are somehow related to you. How can this help you in your genealogical research? This blog will answer that question by explaining shared matches and clustering.

Shared matches

Shared matches are DNA matches who share DNA with each other as well as with you. To view shared matches with any given DNA match, click on that match in your list then click on shared matches or relatives in common. Ancestry DNA, MyHeritage DNA, 23andMe, Living DNA, and Family Tree DNA all have this feature. Identifying common ancestors of shared matches can help you in your genealogy research.[i]

Ancestry DNA identifies which side of the family on which each DNA match is related to you, either by labeling them maternal and paternal, or by labeling them Parent 1 and Parent 2. Further, ancestry’s database allows you to view your matches sorted by ancestor. When you select an ancestor, you will see a family tree showing the line of descent to you as well as DNA matches who have that same ancestor on their tree.

Ancestry and MyHeritage have the option to connect a tree to your DNA results. Not every match within those databases will have a tree connected, and not every tree will be full. Note that a common ancestor could be missing in your tree or your match’s tree. If you and a DNA match are related through an unknown ancestor, Ancestry’s sort by ancestor feature may not show a tree indicating your relation.

Some of the different databases have different features with shared matches. MyHeritage not only shows you how much DNA you share with each shared match, but also how much DNA your match shares with each shared match. 23AndMe includes triangulation, which indicates if you and your match share the same segment with the shared match.

With clustering, you can group your DNA matches together. By identifying which cluster is related through the line of interest, you vastly narrow down which DNA matches to focus on to solve your genealogical problem. The Leeds Method, created by Dana Leeds, is a method to manually cluster your DNA matches. Some companies, such as MyHeritage, offer AutoClustering.

Leeds Method clustering

[ii][iii]The Leeds method of clustering is a manual way to sort your DNA matches. Ideally, you will have your matches sorted into four color categories—one for each grandparent or each great-grandparent couple. If there’s pedigree collapse or endogamy, the Leeds method may not be as effective, as your matches will have a lot of overlapping clusters.

The Leeds method works great with second and third cousin matches—between 400 and 90 Centimorgans (cM). Note that these are cousin matches based on shared DNA, not known family trees. You will begin by copying the usernames of all matches in the second to third cousin range onto a spreadsheet. It is also helpful to include the amount of shared DNA in the spreadsheet.

Assign a color (in a separate column) to the highest DNA match on your list. Go into that match in your DNA database and view the shared matches. Assign the same color to each of those matches on the spreadsheet. On the spreadsheet again, choose the next DNA match with no color assigned, and repeat the process. Continue doing this until every match on your list has at least one color assigned.

Microsoft Excel spreadsheets have sort and filter features, which can allow you to automatically sort your matches by color, or only view certain colors. However, be careful because a misstep in the sorting and filtering can mix up your carefully created chart. To avoid that, make sure the entire chart is selected when turning on sorting and filtering.

The Leeds method was applied to a client project and to the researcher’s own family. The client’s cluster chart had a few relatives with a lot of overlap. Because the client knew their relation to a few of the relatives, the researcher was able to identify a cluster of interest based on matches who were shared with Match J but not with Match P.

The researcher had access to her own DNA results and those of her father. Her cluster chart had the ideal four clusters, but her father’s did not. The father’s cluster chart had a few small clusters that overlapped with larger clusters. The key individuals in these clusters did not share DNA with each other, but shared DNA with many of the same genetic cousins. Using this knowledge, the researcher was able to merge clusters until her father’s chart had four.

Because the researcher is interested in researching her paternal line, she was interested in her paternal clusters and her father’s paternal clusters. Comparing the cluster charts and the shared matches allowed the researcher to identify the paternal grandfather cluster on both charts. Finding the common ancestors of these clusters can help in breaking brick walls on the paternal line.

Auto clustering

MyHeritage offers auto clustering. To begin this process, go to MyHeritage.com and hover over DNA; click DNA Tools. Select AutoClusters. Under the description of AutoClustering, click the Generate button. In a few days, you will receive an email with your AutoCluster report.

[iv]Included in the report is a Microsoft Excel file with the clusters listed and numbered. More visually appealing is a link to your AutoCluster chart, which has the clusters color coded and grouped together. You can rearrange the clusters by clicking the sort-by menu, which has options to sort by cluster, by shared DNA, or by name. When you change the option, you will see an animation of the colors on the chart rearranging themselves as the match names are rearranged.

The researcher uploaded her DNA to MyHeritage and used the AutoCluster option. MyHeritage does not distinguish between maternal and paternal DNA matches, so the researcher had to manually determine which clusters were related on her paternal side.

Going through each match’s tree was a laborious process, and at least half of them did not have many ancestors. Some of the trees were extensive yet the researcher could not see names she recognized in them.

If you have a known relative in the database, it is much more effective to look at their shared matches to determine how your matches and clusters are related to you. The researcher’s only known relative in the MyHeritage database was a maternal aunt. The researcher compared the matches shared with the aunt against her AutoCluster list. By doing this, the researcher was able to identify half of the clusters as being on the maternal side. These will not be applicable for researching the paternal line.

Because it is possible that the researcher inherited DNA from her mother which her mother and aunt did not share, it is possible that some of the maternal matches are not shared between the researcher and her aunt. This means there is no guarantee that the clusters not shared with the maternal aunt are on the paternal side. However, this does narrow the number of genetic cousins for which the researcher has to comb trees. Once the researcher has identified her paternal clusters, she will use them in researching her paternal line.

Through DNA clustering, you can identify which shared matches are applicable to your genealogy problem. The DNA experts at Price Genealogy can help with this.

By Katie

Resources

[i] "Micah's DNA" by micahb37 is licensed under CC BY-SA 2.0.

[ii] "DNA" by Victor Svensson is licensed under CC BY-SA 2.0.

[iv] "DNA rendering" by ynse is licensed under CC BY-SA 2.0.

Introductory photo released into the Public Domain by its author Katrina Darline Posey via wikimedia