Books have always been one of the best resources for family history: they are often very organized and well-researched, and many of them were written by contemporaries of our ancestors. But for those of us who have spent countless hours in libraries hunting for that one golden nugget of new genealogical information, the inefficiency of searching books is all too apparent. In the past few years, there have been remarkable initiatives, including by Google, to scan and digitize millions of books and make their content searchable online. However, even in a new digital format, searching books remained inefficient, and extracting the value from books has for long been beyond the reach of many genealogists.
This challenge was the impetus behind MyHeritage’s latest technology: Book Matching. This innovation automatically researches individuals found in family trees on MyHeritage in its vast collection of digitized historical books. Unique to MyHeritage, the new technology uses semantic analysis to understand every sentence in every page in the digitized books, in order to find matches with very high accuracy. Book Matching has already produced over 80 million new matches for MyHeritage users. Every match is a paragraph from a book specifically about the person in the family tree, providing direct access to that paragraph and the ability to browse through the rest of the book.
By way of background, MyHeritage first launched SuperSearch™, its search engine for historical records, in 2012. In December 2015, the collection of digitized historical books was added to SuperSearch™. Very recently, MyHeritage has tripled the books in this collection to 450,000 books, with a total of 91 million pages. MyHeritage has assembled a team of curators who are adding hundreds of thousands of additional digitized books to the collection each year. The Book Matching technology was released a few weeks ago.
The Challenge
Even after books were photographed and converted to digital, searchable text using optical character recognition (OCR), they used to require a big investment of time and willingness to sift through endless false positives. For example, if you had a Moshe Solomon in your family tree, doing a text search in online books would return results for people called Moshe, or Solomon, with no regard to first or last name. Even if a Moshe Solomon were found, it would likely not be the one you were looking for. There was no way to search specifically for the Moshe Solomon from your family tree (for example, the Moshe Solomon born in the Bronx in the early 1920s who married a Lillian Greenberg).
Book Matching to the rescue
MyHeritage’s Book Matching technology overcomes these difficulties by automatically understanding narrative describing people in the historical books, including names, events, dates, places and relationships, and matching it with extremely high accuracy and speed to the 2 billion individuals in the family trees on MyHeritage. The technology can thus pick out the right Moshe Solomon from the Bronx, married to Lillian and born in the early 1920s, and provide the user a high-value source of new information about him, if such information was published. The matching process is repeated automatically as users grow their trees and as MyHeritage adds more books.
A Daunting Task Made Simple
In structured documents, such as birth certificates or census records, the type of information presented in the data is well-defined and consistent. It is clear where to find surnames, birth dates, and so on. For that reason, matching family trees to structured data can be straightforward.
On the other hand, in unstructured free-text data, like digitized historical books, facts such as names, events, dates, locations, and relationships can be written in many different ways and in varying contexts, and the information has no designated location or order. The challenges in creating computer algorithms that understand free text are significant. For example, while general phrases like “death”, “died” and “passed away” can all refer to a person’s death, so can less commonly-used phrases such as “expired”, “received a fatal gunshot wound”, “ended his earthly career”, or “summoned to the home beyond”. A date such as “In the 8th of December of the year of our Lord 1840” is synonymous to “Dec 8, 1840”. “Sara Cohen” might be referred to as “Sara, the youngest daughter of Mr. Cohen”, and so on. Specialized technology is needed to follow this, and MyHeritage has created such technology, which understands all these examples and thousands more, and pieces them together. MyHeritage has built numerous algorithms to harvest family history information from books. These have been tested and tweaked, iterated and perfected, to ensure a high level of accuracy. In the process, MyHeritage also successfully overcame millions of OCR errors and fixed them. For example, if the OCR process thought that a person was born in “]\lay”, it understands that it’s really May, “Apnl” is really “April”, and so on.
Currently, some books in the collection of digitized books are duplicated because they were contributed to the public domain multiple times by different groups. Nobody was able to figure out that some of them are redundant. MyHeritage is currently putting the finishing touches to specialized technology that is able to de-duplicate the books. Once this work is completed in the next few weeks, most of the duplicate matches will disappear.
Book Matching in Action
MyHeritage recently showed some leading genealogists their Book Matches, so they could see first-hand the matches found for their own family trees.
Dick Eastman of Eastman’s Online Genealogy Newsletter has been researching his family history for decades. He has about 2780 people in his family tree on MyHeritage, and he received about 500 Book Matches. The majority of the information in the Book Matches was new to him.
For example, Elizabeth Fifield, the aunt of Dick’s direct ancestor (8 generations), appeared in his family tree with only birth and death dates, and siblings.
An automatic Book Match was found for Elizabeth in the book “Genealogical and Personal Memoirs Relating to the Families of the State of Massachusetts (by Cutter, William Richard, 1847–1918), a source that Dick Eastman may never have thought to examine himself.
The excerpt below is the section that was automatically found by MyHeritage’s Book Match. The exciting new information here lists Elizabeth’s husband, and other historical information about him and his family, such as their six children and their dates of birth — information that Dick did not previously have, and that now enables him to add a complete line to his family tree.
Jewish genealogist Rose Feldman has a family tree of 3712 people on MyHeritage. She has received 91 Book Matches. One of her matches was for her relative Martin Cherkasky, from the publication The Nation (Index to Volume 193, July–December, 1961). The match reveals her relative’s professional and social standing (a distinguished doctor, and head of a major New York hospital), as well as an interesting quote from Cherkasky himself about the impact of finances and greed on the medical profession:
Lawyer and genealogist Randol (Randy) Schoenberg is an avid researcher of Jewish Austrian-American family history. Randy is an active curator on Geni.com and has also placed some of the trees he is researching on MyHeritage. In one of those trees there is a Marshall H. Kashman, a distant relative of Randy from his wife’s side. He appears in the family tree with limited information.
In an unlikely place — a book called History of the 101st Machine Gun Battalion, 1922 — fascinating new details about this relative are revealed.
In that book, we can see Marshall’s position in the army, his profession in civilian life back in Hartford, Connecticut, that his nickname was “Kash”, and that he survived being gassed during WWI at Verdun. We also see a photo of Marshall as a young soldier.
Summary
Book Matching is a unique technology developed by MyHeritage and is available only on its service. It constantly researches all individuals in every family tree, inside hundreds of thousands of digitized historical books on MyHeritage. The matches are based on semantic analysis of narrative, and are therefore extremely accurate.
Many genealogists love census records, and birth, marriage and death records. Such documents are helpful when building a family tree. But they often contain little beyond names and dates, and rarely describe anything about the character of our ancestors and the key events that shaped their lives. Genealogy is more than just name-collecting. This is what makes Book Matching so valuable: it often reveals intimate details of our relatives’ and ancestors’ lives that one can only get from narrative in a book. Along with adding rich color to the family tree with no effort required from the user, Book Matches also prove to be extremely accurate: none of the genealogists who previewed this feature at RootsTech 2016 encountered a single false positive, among thousands of matches.
Book Matches are useful to seasoned genealogists and family history beginners alike, and offer high value to both. Even the most conservative genealogists, who value manual research and tend to avoid new technologies, will probably appreciate the fact that Book Matching reminds the genealogy community of the rich value of the books that have been in front of us all along.
What’s Next?
Book Matching is currently available for English books only, but the technology will soon be enhanced by MyHeritage to cover additional languages. MyHeritage is constantly expanding its repository of digitized historical records, facilitating easier family history research. Jewish genealogists may receive fewer Book Matches than other genealogists, because Jewish genealogy has not been blessed with a plethora of historical genealogical publications about historical Jewish villages and hometowns. However, as MyHeritage adds historical books written in Hebrew, and enhances Book Matching to support the company’s Global Name Translation Technology, promising new opportunities for discoveries in books will unfold. The Mormon Church has digitized a large number of Jewish genealogy books. Potential cooperation may make those books accessible on MyHeritage, enabling even more matches through the Book Matching technology. So the present is exciting and the future is even more exciting.
How To Access Book Matches on MyHeritage
The Compilation of Published Sources collection is free to search manually. Viewing Book Matches requires a MyHeritage Data subscription. If you have a family tree on MyHeritage, simply log in to your family site and check your Record Matches via the Discoveries menu, and click the collection called Compilation of Published Sources. Your Book Matches will be displayed there. Also check your inbox for Record Match emails — these are sent regularly to MyHeritage users, delivering newly found matches. Any match you receive from a book is made possible by this new technology. If you don’t have a family tree on MyHeritage yet, you may open an account for free on www.myheritage.com and import your family tree as a GEDCOM file, or start a new tree. New matches will be automatically calculated for you, including Book Matches.