Going Beyond Ethnicity – A DNA Journey
[Editor’s Note: Our friends at Legacy Tree Genealogists provided the following educational content on DNA education authored by DNA expert Paul Woodbury. And here is an exclusive offer for Geneablogger readers: Receive $100 off a 20-hour research project from Legacy Tree Genealogists using code SAVE100. Valid through May 5th, 2017. Click here for more information.]
As a specialist in genetic genealogy, one of the most frequent topics I address in my conversation with others is ethnicity estimates. Someone might say something like: “I’m not really sure how much to trust those genetic tests since my grandmother was Italian, and I only came back with 15% Italian in my results. If they can’t even get the ethnicity right, then what use are they?”
In reality, there are two parts of genetic genealogy test results: ethnicity admixture and genetic matches. Ethnicity admixture results analyze the mutations and segments of DNA and determine in which populations those mutations and segments are most often found. Genetic cousin match lists calculate the number, location and size of segments of DNA that different individuals share in common. Based on the number, size, and location of segments, the relationships between a test subject and their genetic cousins are estimated. While ethnicity results can be helpful in some specific situations, genetic cousin match lists are the most useful element of DNA test results.
Each individual inherits half of their autosomal DNA from each of their parents. Beyond that, the amount of DNA shared in common is only approximate due to a random process called recombination, which shuffles the DNA each generation. Each individual will inherit about 25% from each grandparent, 12.5% from each great-grandparent and approximately half the previous amount for each subsequent generation. Although two first cousins will have both inherited 25% of their DNA from each of their common grandparents (50% in total) they will have inherited a different 25%. Therefore, first cousins will typically only share about 12.5% of their DNA in common. Because descendants along distinct lines inherit different portions of their common ancestors’ DNA, it is important to test as many people from distinct family lines as possible.
Every individual in your DNA match list shares at least one segment of DNA with you that you likely inherited from a recent common ancestor. Based on the number of segments you share, the length of those segments, the position of those segments, and the likelihood of inheriting those segments over multiple generations, DNA testing companies estimate how closely related you are to different individuals in your match list. Closer relationship levels share unique and distinct levels of DNA. For example, the amount of DNA shared between siblings will be very different from the amount of DNA shared between first cousins, which in turn is distinct from the amount of DNA shared between second cousins. More distant relationships, however, are slightly harder to differentiate. The amount of DNA shared between fourth cousins could be the same as the amount of DNA shared between fifth or sixth cousins. Some more distant cousins may not share any DNA at all. Even though they may have both inherited DNA from their common ancestors, they could inherit unique segments of DNA.
So why are DNA match lists more useful than ethnicity estimates? While your ethnicity admixture results may report a general region of the world where your ancestors may have lived 300-1000 years ago, match lists give valuable clues regarding genealogical relationships to other individuals. Most of your genetic cousins are related to you within a genealogically relevant time frame. Even if you are not able to determine the exact common ancestor between you and your genetic cousins, their test results and their pedigrees may offer clues regarding specific towns and places of origin for your own ancestors. Through analysis and correlation of the trees, origins, and ancestors of members of your DNA match list, you may be able to identify previously unknown ancestors, uncover likely relatives, connect with lost branches of your family tree, and break through genealogy brick walls.
To make the most of your DNA match lists, consider the following four principles:
Genetic cousin match lists can be overwhelming. Where to start? How to begin? I recommend starting with what is closest to you. Who are your closest cousins? The more DNA a genetic cousin shares with you, the more likely it is that you will be able to identify a common ancestor with that individual. Even if you can already see how you might be related to someone, collaboration can still be helpful. Just as they will have inherited different DNA than you have from your common ancestors, they will also have inherited different stories, information, and documents that may be helpful for your search.
When collaborating with genetic cousins, make your communication brief, clear, and to the point. If it is your first attempt at contact, briefly introduce yourself. Briefly explain your research interests and explain why you are contacting them. Make 1-3 specific requests of them, offer to provide assistance or information in return, and provide direct contact information if desired.
For example, an attempt at collaboration might look like this:
“My name is [your name here] and it appears that we are genetic cousins. I have been doing genealogy research for the past five years and I am particularly interested in learning more about my maternal grandmother’s French ancestry. Based on our shared DNA and shared relatives, it appears that you may be related to my maternal grandmother. Do you have ancestry from Southern France? Do you have a family tree you can share with me? If not, could you share the names of your grandparents or great-grandparents? I would love to collaborate with you to determine the nature of our shared relationship. I have performed thorough research on my French family and have several hundred documents relating to that side of my family. If we can determine our relationship, I would be happy to share the documents, sources, and information pertinent to your family tree. Feel free to contact me through this messaging system or directly via email [email here] or by phone at [phone number here].”
Some common requests you might make while collaborating with genetic cousins include the following:
- Request access to a family tree.
- Request the names of the individual’s ancestors, keeping in mind that typically it is better to ask for the names of grandparents or great grandparents rather than parents. Asking for information regarding living individuals may make some individuals feel uncomfortable and may prevent them from responding to your request.
- Request that they transfer their test results to Gedmatch.com or another website so you can explore your relationship further.
- Request that they share their ethnicity report or their match list with you.
- Request contact information for other relatives who may know more regarding their family history.
- Request information about the amount of DNA and the known relationships they may have with genetic cousins you share.
- Ask if they have any close genetic cousins who have also tested. Knowing which of their close relatives you do not match may help to narrow down how you are related.
- If their relationship is already known, request information that they may have regarding your shared ancestor and collateral relatives.
In one recent case we performed at Legacy Tree, we were attempting to locate information regarding an individual’s biological father whom she had never met. She had a name and an occupation and that was all we had to start with. When we reviewed her test results, we found that she had a close genetic cousin who was an estimated second cousin. Based on her relationships to the client’s other matches, and based on her ethnicity, we knew that she was a paternal relative of the client, but did not know exactly how. We could have spent more than 20 hours documenting each of her great-grandparents and all of their descendants, but instead we contacted her to ask for additional information on her family tree. In mentioning the name of the client’s biological father, the match knew exactly who we were talking about and gave us information regarding his later family, his immigration to Puerto Rico, and his death – thus pointing us to the exact family of interest and saving us and the client a great deal of effort.
The main goal of most collaboration is to identify the source of shared DNA with a genetic cousin. But what happens when they never respond to your request? Even for non-responsive matches, it is frequently possible to determine how they are related to your family. The key to successful identification is to use every piece of evidence afforded.
Some common pieces of evidence frequently included as part of DNA profiles and which might help your search include the following:
- Username: if the username is unique or if it resembles a real name, use that to guide searches in public records, whitepages, published email lists, and social media accounts. Numbers in usernames often refer to important dates like birth or marriage. Many people use the same username with their email and social media accounts. They may also use that same username to publish queries in online genealogy forums relating to their ancestors.
- Profile picture: Use this to compare against yearbooks, newspapers, obituaries, and Facebook. You can perform reverse image searches using Tineye and Google.
- Age, birth date, birthplace, and residence: Use this information to guide searches in newspapers and online directories. Consider searching databases of yearbooks. You can also use this data to search for more recent and updated contact information.
- Small, limited, and private family trees. If they have a tree attached to their test results or to their member profile, use all information it provides. Extend their ancestry for them. If the tree is private, as is frequently the case at Ancestry.com, perform searches of your known family names to see if any of them appear in your genetic cousins’ private tree. Also remember that the default naming pattern for trees at Ancestry.com is to select the surname of the user followed by “Family Tree. Other websites follow a similar naming pattern and the name of the private tree could provide clues regarding your shared ancestry.
- Names of most distant known ancestors, research interests, and lists of surnames: Use this information to perform searches of combinations of surnames in databases of compiled family trees and genealogical records. Once several ancestors of a genetic match have been identified, trace their descendants until you are able to narrow down to the match themselves.
- Centimorgans, percentages, and number of segments shared: Some amounts of shared DNA are unique to specific levels of relationship. You can estimate the likelihood of different levels of relationship using the data published at the shared cM project as well as data published in the AncestryDNA help menus and at ISOGG.org.
- Shared DNA matches: Though it may not be possible to identify how a match is related to you specifically, it may be possible to determine their likely relationship based on how they are related to your other known matches.
In general, social media, newspapers, obituaries, and public record databases are excellent sources for locating information on living people. As you perform these searches, however, remember to respect the privacy and wishes of those who may not want to be contacted.
In a recent case we were able to identify the father of a woman born in Melanesia by extending the ancestry of several close genetic cousins using some of the strategies listed above. Even though these genetic cousins did not respond to requests for collaboration, and even though they provided very little information regarding their family trees on their respective DNA profiles, we were able to reconstruct this woman’s British ancestry using the trees we constructed through public records for her close genetic cousins and searching for connections between collateral relatives of each cousin. Once we had reconstructed her tree we were able to trace descendants of each of her likely ancestors and identify her father.
Dealing with a huge number of autosomal DNA matches can be overwhelming and confusing. I recommend organizing matches based on their known relationships to the test subject and to each other. Organization of DNA evidence follows some of the same principles as organization of traditional research. Just as good genealogy researchers will keep logs of their searches and their correspondence, genetic genealogists should also keep logs of their research and correspondence. These “logs” often take form as notes and commentary on genetic matches. Each DNA testing company offers means of annotating DNA matches, but frequently these notes are not searchable, making it somewhat difficult to locate “that one match who was related to so-and-so.”
Several third party tools can assist in organizing your DNA matches and your notes on those matches. The AncestryDNA helper chrome add-on by Jeff Snavely enables automated scans of AncestryDNA data and will add buttons to your interface at Ancestry.com. Included in these buttons is an option to search your results by user, reported surnames or notes. Another third party tool is the DNAGedcomClient by Rob Warthen. This subscription app enables researchers to perform automated scans of DNA test results at Ancestry.com and 23andMe. The outputs of these scans are spreadsheets with information on shared DNA, ethnicity estimates, in-common-with matches and notes on genetic matches. As spreadsheets, they are searchable and can enable easy location of any notes that have been added to specific matches in the subject’s account.
Spreadsheets are an excellent way of organizing DNA matches. Each line in the spreadsheet can be dedicated to a different genetic cousin or match. We might recommend keeping separate spreadsheets for different tests or different subjects. In spreadsheets, researchers can comment on shared segments, known relationships, potential relationships, shared surnames, shared ancestral origins, and shared genetic cousins between a subject and a match. These notes can then be used during the analysis and correlation stages of the genealogical proof standard.
Another popular program for organizing DNA matches is Genome Mate Pro. As professional genealogists, we rarely utilize this program for clients since it requires a significant amount of input before meaningful results can be organized. Nevertheless, it is very useful as a database and organization tool for personal research and investigation.
Organizing your DNA matches is a daunting task not only because there may be a large number of them, but also because they are constantly changing. Developing a strong organization structure can seem like an attempt to hit a moving target. It can be even more daunting if there are multiple moving targets. The purpose of organization is to enable genealogical discovery, and genealogical discovery is most often achieved when pursued through the lens of a narrow and specific focus. Technology is meant to serve as a tool to enable a researcher’s goals and purposes, but sometimes it can become the end in and of itself. This is as true of genetic genealogy testing as it is of any other type of technology. Without clear goals and research objectives, the tools genetic genealogy offers can end up being your task masters. Rather than letting your DNA test results dictate the direction of your research, use genetic genealogy test results as a tool to make genealogical discoveries. Rather than attempting to document your relationship to each genetic cousin in your match list (an increasingly impossible task as more and more people test), seek to identify your relationship to your closest matches and then use that information to guide your prioritization and investigation of other more distant matches. Choose a specific research objective and then use your test results to narrow down to a pool of matches which are most pertinent to your genealogy research questions. This will make organization of your matches much more manageable and much more useful.
We recommend focusing on your closest matches and matches that appear to be pertinent to the specific research questions you are exploring. If a genetic cousin shares more than 50cMs with you, there is about a 50% chance they are related within 9-10 generational steps and there is a much higher chance you will be able to identify a common ancestor. Once close genetic cousins have been identified, you can then search for other more distant cousins who are likely related through the same ancestral lines by identifying genetic cousins who match at least two known descendants of an ancestor of interest. You can also eliminate other genetic cousins from consideration in your research if they match known relatives from your other family lines. If you are attempting to extend unknown ancestry, document which relatives belong to your known family and then prioritize investigation of those who also match them and who have unknown relationships. Use relationships of genetic cousins to each other to identify which genetic cousins are most pertinent to a research question.
Chromosome mapping is a type of organizational strategy that can be helpful in some situations and can guide collaboration with genetic cousins. For chromosome mapping, focus on identifying your relationship to known second cousins and more distant relatives. Then identify the segments of DNA you share in common. Individuals who share those same segments of DNA with you are likely related through the same ancestral lines. Though chromosome mapping is useful as an organizational strategy, it can also easily become an end in and of itself. Remember that your main objective will typically be to make genealogical discoveries and extend ancestral lines. In our experience, we consider chromosome mapping to be the last resort for making genealogical discoveries. Analysis of relationships, evaluation of shared DNA, and extension of family trees between genetic cousins is the most useful approach for genealogical discovery.
In a recent case performed by Legacy Tree, one of our client’s was attempting to extend the ancestry of her great-grandfather who was born in the Southern U.S. in about 1840 with the very common name of John Jones. Several candidate ancestral couples had been identified as possible parents, but exhaustive traditional research had not provided conclusive evidence for any of the candidates. We constructed a “genetic network” of her 500 estimated 4th cousins and identified all of the genetic cousins to whom each of them was related. Using this information we used proprietary technology to quickly identify groups of related individuals among the client’s genetic matches. We eliminated from consideration those genetic cousins who were related through her maternal ancestry and identified several genetic cousins who were related through the ancestor of interest. Using these genetic cousins as a “search query” we next identified all genetic cousins who were related to at least two descendants of the client’s great grandfather or who fit as part of their genetic network group. Using this strategy we identified common ancestors between more distant relatives. As a result, we were able to connect the client’s great-grandfather to his ancestral family and extend his ancestry an additional four generations.
Successful genetic genealogists apply DNA inheritance patterns and probabilities of relationship to specific research problems. Once you have identified a likely relationship between yourself and a genetic cousin, determine if your proposed relationship fits with the observed amount of DNA you share with each other. Some questions you might consider include the following:
- Does the amount of DNA you share with your genetic cousin fit with what you would expect given your documented relationship? In other words does your documented second cousin share an appropriate amount of DNA to be a full second cousin, or is it possible he may be a half relative or may be related in some other way?
- Are there other ancestral lines that you share in common with your match which could provide alternative explanations for your shared DNA?
- Are there other ancestral lines that match 1 shares with match 2 independent of your relationship to either of them? In other words, does your maternal first cousin also share ancestry with your paternal first cousin independent of their respective relationships to you?
- Do we share other types of DNA that we would expect given our proposed relationship? If your proposed genealogical relationship indicates that you share common direct-line paternal ancestry, do you share a common Y-DNA signature? If not, there may be a case of misattributed parentage. If your proposed genealogical relationship indicates that you share common direct-line maternal ancestry, do you share a common mtDNA signature? If not, again there may be a case of misattributed parentage. If you share common ancestors who could have contributed DNA to both of your X-chromosomes, do you share DNA on the X-chromosome, and if not, what is the likelihood of that scenario given your proposed relationship?
- Are there any ancestral lines that are not well represented in your DNA match list? Are there close genetic cousins with known relationships to each other, but no known relationship to you?
- Are there known relatives who you might invite to test who could represent ancestral lines that are not represented in your match list? Once they have tested, do they share the amount of DNA that would be expected given their relationship? Do they share DNA with other individuals from the family of interest who do not share DNA with you?
When evaluating your DNA test results, it is possible to determine the probabilities of likely relationships based on the number of segments and the number of centimorgans shared. Centimorgans (cMs) are a measure of genetic recombination, and communicate the likelihood that two points on a single chromosome will be separated in one generation. Some ranges of shared centimorgans are more likely for specific levels of relationship than they are for others. For example if an individual shares 255 cMs with a test subject there is more than a 50% chance that they are related at the level of second cousins and nearly a 100% chance that they are related within the range of first cousins once removed to second cousins once removed — or some equivalent combination of relationships. The following chart from the AncestryDNA Matching White Paper shows the probabilities of different levels of relationship given an observed amount of shared DNA:
In addition to this resource, we also recommend reviewing information from the Shared Centimorgan Project hosted by Blaine Bettinger, and the autosomal DNA statistics pages available through the International Society of Genetic Genealogy wiki (isogg.org). By considering the likelihood of proposed relationships given shared amounts of DNA, it strengthens the traditional and genetic evidence for genealogical proof.
In a recent case at Legacy Tree, we were assisting an individual to document the relationship between herself and a genetic cousin with an unknown relationship. Based on the amount of DNA they shared in common, they should have been related at the level of third cousins. Nevertheless, comparison of their two trees revealed that neither shared any common surnames, ancestors, or locations in their quite extensive family trees. Additional investigation into their shared matches showed that the match held several genetic cousins in common with the subject, all of whom descended from a specific ancestral couple who lived in the 1880s in Tennessee. Consultation of the client’s match list revealed that she had no genetic cousins from the ancestry of her paternal grandfather, and additional analysis revealed that her father was likely not the biological son of the man he assumed was his father. In another case, we discovered that one genetic cousin shared DNA with a client through their common fourth great-grandfather, and that both of them matched several other descendants of the same man. However, the match’s brother did not share DNA with the client and did not match any of the descendants of the common ancestor of interest. Additional investigation revealed that the match’s brother was in fact a half-sibling. These stories highlight the fact that DNA testing can result in unexpected discoveries that may change the way you view your family, so it is important to tread carefully and be respectful of the feelings of the individuals involved.
Though ethnicity results can be helpful in some cases of genealogical research, there is so much more that can be done with your test results beyond the dinner-conversation topics of your ethnicity admixture. Collaborate with your genetic cousins to connect with living family members and learn information about your shared heritage. Identify your relationships to genetic cousins and document your relationships to each other. Organize your DNA matches to better analyze your test results. Evaluate your shared DNA with your known relatives and determine if your proposed relationships fit with what you would expect. By following the basic principles of collaboration, identification, organization, and evaluation you will be well on your way to making genealogical discoveries using your DNA test results.
Paul Woodbury is a Senior Genealogist with Legacy Tree Genealogists, a genealogy research firm with extensive expertise in genetic genealogy and DNA analysis. To learn more about Legacy Tree services and its research team, visit the Legacy Tree website at https://www.legacytree.com
Disclosure statement: I have material connections with various vendors and organizations. To review the material connections I have in the genealogy industry, please see Disclosure Statement.