Beider-Morse Phonetic Matching: An Alternative to Soundex with Fewer False Hits

Searching for names in large databases containing spelling variations has always been a problem. A solution to the problem was proposed by Robert Russell in 1912 when he patented the first soundex system. A variation of Russell’s work, called the American Soundex Code, is used by the U.S. Census Bureau to facilitate name searches in the census.

Simply put, soundex is an encoding of a name such that names that sound the same will get the same codes. A search application based on soundex will look for matches of the soundex code rather than matches of the name itself, thereby finding all names that sound like the name being sought.

As an example, the American Soundex code for Schwarzenegger is S625. If the name was misspelled as Shwarzenegger, the code would still be S625, so any search application based on American Soundex would still find the match in spite of that misspelling. If, however, the name was misspelled as Schwartsenegger, the American Soundex code would be S632, so a search application based on American Soundex would not find the match with that misspelling.

A major improvement to soundex occurred in 1985 with the development of Daitch-Mokotoff (DM) Soundex by Randy Daitch and Gary Mokotoff. DM Soundex is a soundex system optimized for Eastern European names. Under DM Soundex, the correct spelling, Schwarzenegger, has two codes, namely 474659 and 479465. The incorrect spelling, Shwarzenegger, has the same two codes, and the incorrect spelling, Schwartsenegger, has the DM code of 479465, which is one of the two codes for the correct spelling. So a search application based on DM Soundex would find the match with either of these misspellings. This illustrates the advantage of DM Soundex over American Soundex for Eastern European names (Austrian in this case).

Both of these soundex systems have, nevertheless, a major disadvantage—they generate many false hits, requiring the researcher to wade through a lot of extraneous matches. The phonetic-matching method proposed in this paper attempts to alleviate that situation.

Main Principles of Phonetic Matching System

Beider-Morse Phonetic Matching (BMPM) was developed by Alexander Beider (Paris) and Stephen P. Morse (San Francisco). Beider dealt with the linguistic part of this method and Morse with the computer aspects and all technical issues. Major algorithmic decisions are due to common efforts of both authors.

The main objective of BMPM consists in recognizing that two words written in a different way actually can be phonetically equivalent, that is, they both can sound alike. Unlike soundex methods, however, the “sounds-alike” test is based not only on the spelling, but on linguistic properties of various languages.

For common nouns, adjectives, adverbs, and verbs this task is of limited interest. Except for orthographic and typographic errors, these words rarely have spelling variations. The situation is different for proper nouns (i.e., names). They can appear in documents written in different languages and can be spelled according to the phonetic rules of the language of the document. Determining that two different spellings correspond to the same name becomes even more difficult when the two spellings use letters from different alphabets.

As an example, consider the name Schwarz (standard German spelling). It can appear in various documents as Schwartz (alternate German spelling), Shwartz, Shvartz, and Shvarts (Anglicized spellings), Szwarc (Polish), Szwartz (blended German-Polish), Şvarţ (Romanian), Svarc (Hungarian), Chvarts (French), Chvartz (blended French-German), Шварц (modern Russian), Шварцъ (Russian before 1918), שברץ and שורץ (Hebrew), and שווארץ (Yiddish).

In its current implementation, BMPM is primarily concerned with matching surnames of Ashkenazic Jews. This is due to the list of languages whose graphic and phonetic features are already taken into account. These languages are Russian written in Cyrillic letters, Russian transliterated into English letters, Polish, German, Romanian, Hungarian, Hebrew written in Hebrew letters, French, Spanish, and English. The name matching is also applicable to non-Jewish surnames from the countries in which those languages are spoken.

The structure of BMPM is general, and we are already planning to extend it to additional languages such as Lithuanian and Latvian. We also plan to incorporate Italian, Greek and Turkish, since this would allow BMPM to be applicable to Sephardic names (as well as non-Jewish names from those countries). In order to extend it to a new language, all we need to do is include supplementary rules specific to that language. The rules are not hard-coded into the program; instead the phonetic engine is table driven, and all that is necessary is to add additional tables to support the additional languages. A description of the different tables involved is presented below.

BMPM is designed to be used as a programming tool, and an individual would be very hard-pressed to do the calculations manually. To use the system, a user would enter a name on a form; that name would be transmitted to a server running the phonetic engine that would generate the BMPM code, and that code would then be compared to the BMPM codes that were previously generated for all the names in a specific database. The steps of this comparison are described in the following sections.

Step 1. Identifying the Language

The spelling of a name can include some letters or letter combinations that allow the language to be determined. Some examples are:

tsch, final mann and witz are specifically German
final and initial cs and zs are necessarily Hungarian
cz, cy, initial rz and wl, final cki, letters ś, ł and ż can be only Polish

More often, several languages can be responsible for a letter or a letter combination. For example, ö and ü can be either German or Hungarian, final ck can be either German or English, sz can be either Polish or Hungarian. Sometimes it can be easier to name the language or the languages in which the letters in question can never occur. For example, y and k are not present in Romanian, v can not be Polish, the string kie can be neither French, nor Spanish.

The current version of BMPM includes about 200 rules for determining the language. Some of them are general, whereas others include the context in which they are applicable (e.g., beginning or the end of a word, following or preceding some letters). The processing of these rules yields one or several languages that could, in principle, be responsible for the spelling entered by the user.

One option of the BMPM engine allows for specifying the language explicitly. That would apply when the database is known to be in a specific language, in which case each name in that database can be encoded using the rules of that language, and the language-determination test need not be done.

Step 2: Calculating the Exact Phonetic Value

In a number of languages, forms of surnames used by women are different from those used by men. For example, it would be Jan Suchy but Maria Sucha. The wife of Mr. Novikov would be called Mrs. Novikova. This occurs in Slavic tongues (including Polish and Russian), Lithuanian, and Latvian. Since the name under analysis can, in principle, be feminine, this step starts with replacing feminine endings with the masculine ones.

After the name has been defeminized, the phonetic engine tries to identify the exact phonetic value of all letters and transcribe them into a phonetic alphabet. Since, in principle, the number of different sounds is huge, we decided to restrict the phonetic alphabet used in BMPM to those sounds that are shared by the languages we were interested in. For example, the difference between Polish y and i was deliberately ignored, because there is no way to express it in non-Slavic languages. Also ignored was the difference between the two sounds expressed in German by ch, those present in words ach and ich. For the same reasons, numerous vowels found in French and English do not figure in our version of the phonetic alphabet, but instead were replaced with the closest equivalents found in Germanic and Slavic languages. The retained list appears in the table below.

Table 1. Phonetic Alphabet

	Example		Example
a	Like in part	o	Like in port
b	Like in boy	p	Like in pot
d	Like in dog	r	Like in ring
e	Like in set	s	Like in star
f	Like in flag	t	Like in tent
g	Like in dog	u	Like in flu, or oo in good
h	Like in hand	v	Like in vase
i	Like in Nice (the city), or ee as in fleet	w	Like in wax
j	Like y in yes, equivalent to German j	x	Like ch in loch; equivalent to German ch
k	Like in king	z	Like in zoo
l	Like in lamp	S	Like s in sure, or sh in shop
m	Like in man	Z	Like z in azure; equivalent to French j
n	Like in neck

Generally, we chose the same signs for sounds as those used by the International Phonetic Alphabet (IPA). The only exceptions are S and Z, whose IPA equivalents are ʃ and ʒ, respectively. Our choice was dictated by limiting ourselves to standard Latin characters present on any keyboard using the Roman alphabet.

The transcription of the name into the characters found in the above table (a better term for it would be mapping) depends on the result of Step 1. Either Step 1 determined a unique language, or it determined a set of possible languages.

If only one possible language was left after Step 1, the phonetic engine transcribes the spelling to the phonetic alphabet using rules specific to that language. In BMPM, every language possesses its own set of rules for this mapping (less than 40 for Romanian, about 80 for German, and more than 130 for Polish). For example, if the language is German, then some of the rules are:

sch maps into the S of our phonetic alphabet
s at the start of the word and s present between two vowels becomes z
w becomes v

For certain languages, some letters can be read in several ways. In these cases, the phonetic engine assigns them two (or more) elements from the phonetic alphabet. For example, Polish a normally corresponds to phonetic a. In some cases, however, this letter can result from Polish ą in which the diacritic sign (ogonek under the a) was lost. In this example, the phonetic value would be either om (before b or p) or on (before other consonants).

If Step 1 resulted in more than one possible language, the phonetic engine processes the name using generic rules. To adequately support the languages of the current version of BMPM, we needed to write more than 300 generic rules. There are two types of such generic rules—ones that are language independent and ones that apply only to certain languages.

An example of a language-independent generic rule is the rule for final tz—it can be pronounced only as English ts. Such language-independent generic rules are applied regardless of which languages are present in the output of Step 1. Other generic rules might be applicable, however, to specific languages only. The output of Step 1 would determine whether or not these language-specific generic rules would be applied. For example, ch can be mapped (using the signs of our conventional phonetic alphabet) to x in Polish or German, S in French, or the diphthong tS in English or Spanish. If during Step 1 we learn that English, Spanish, and French are not possible, only the Polish/German language-specific rule will be applied, causing the ch to be mapped to x.

Once the name is processed by either the generic rules or the language-specific rules, the phonetic engine applies to the resulting string of phonetic characters a series of phonetic rules that are common to many languages. As an example, consider the rule known in linguistic literature as final devoicing. It applies to many European languages, such as German, several Slavic tongues including Russian and Polish, and some dialects of Yiddish. Final devoicing states that at the end of the word the voiced consonants are pronounced as their unvoiced counterparts—i.e., b is pronounced as p; v as f; d as t, etc. The phonetic engine takes this peculiarity of speech into account and keeps in the final position only the unvoiced consonants. For example, Perlov gives Perlof. Another rule, also applied by the phonetic engine, is that of regressive assimilation, whereby a consonant acquires characteristics of the consonant that follows it:

Voiced consonants become unvoiced when followed by unvoiced consonants. For example, b before s is pronounced as p: Shabse is equivalent to Shapse.
Unvoiced consonants become voiced when followed by voiced consonants. For example, t before z is pronounced as d: Vitzon becomes Vidzon.

At the end of Step 2, the initial surname is transformed by the phonetic engine into one or several strings of characters that we call the exact phonetic value.

Step 3: Calculating the Approximate Phonetic Value

After the rules mentioned in Step 2 are applied, the phonetic engine applies a series of additional rules. These rules take into account the fact that some sounds can be interchangeable in some specific contexts that are more complex than the contexts considered in Step 2 (“beginning/end of word” or “previous/next letter”). For example, in Russian and Belarusian, unstressed o is pronounced as a. As a result, Mostov and Mastov sound alike because the first syllable is unstressed. On the other hand, there is no interchangeability in the stressed position: Kats and Kots sound differently. Since automatic determination of the stress position is non-trivial, we decided to deal with a and o as approximately interchangeable. Other rules allow for phonetic proximity of a pair of sounds resulting in their partial confusion. For example, n before b sounds close to m and Grinberg becomes approximately equivalent to Grimberg. (Note that in Spanish this equivalence is total. Consequently, in Argentina, Grinberg and Grimberg are exactly equivalent.)

Just as in Step 2, the approximate rules applied here can be either language-specific or generic, depending on the results of Step 1. To adequately handle the languages of the current version of BMPM, we needed to write about 200 rules common to all languages, about 120 generic rules (some of which are limited to certain languages), and several dozen language-specific rules per language.

At the end of this step, the initial surname is transformed by the phonetic engine into one or several strings of characters that we call the approximate phonetic value.

Step 4: Calculating the Hebrew Phonetic Value

All previous steps, even if they were primarily designed to process Ashkenazic Jewish surnames, can in principle be applied to other cultures too. This step, on the other hand, is specifically Jewish. The main aim of this step consists in taking into account the fact that the initial name as written in Latin or Cyrillic characters can be the result of a transliteration from Hebrew. Such spellings are commonplace in various materials related to the Holocaust. Numerous memorial (yizkor) books of communities from Eastern Europe are written in Hebrew and, as a result, the names they mention appear in Hebrew characters. Many lists from these books were transliterated by Jewish genealogists, and in many cases, the resulting spellings using Latin characters are simply educated guesses. In the online searchable database of the Holocaust victims provided by Yad Vashem in Jerusalem, many surnames from interwar Poland fall in this category—they appear on the Pages of Testimony compiled in Hebrew during 1950s and 1960s, and the spelling using Latin characters often represents a guess by Yad Vashem’s employees.

Since some vowels do not appear in Hebrew spelling and the sounds of other vowels and certain consonants are ambiguous, a transliteration of the same name from Hebrew to Latin characters made by different people can yield different results. For example, פסטר can yield Fester, Faster, Paster, Pastar, Pester, Fasater, Psater etc., בין can correspond to surnames that were spelled in German as Bien, Bin, Bühn, Bün, and Bein; פרימס can be Frimes or Primas.

This step is designed to fix the issues related to the transliteration from Hebrew. To accomplish this, the phonetic engine takes the results of Step 2 and applies a series of additional rules that allow for the ambiguity of certain sounds when dealing with the Hebrew spelling. At the end of this step, the initial surname is transformed by the phonetic engine into one or several conventional strings of phonetic characters that we call the Hebrew phonetic value. A surname whose Hebrew spelling is the same has the identical Hebrew phonetic value. Some examples are Bader and Beder; Brak, Berak and Barak; Bober, Buber, and Bubar; Brauner, Bronner, and Bruner; Mandel, and Mendel; Thaler and Teller; Zipper, and Ziffer.

Note that the Hebrew phonetic value calculated here can apply to surnames that are spelled in Latin, Cyrillic, or Hebrew characters. In all these cases, the original characters have already been mapped into the characters of the phonetic alphabet during Step 2. As a consequence, this step deals with strings of phonetic characters only.

Step 5: Searching for Matches

Applications of name matching involve searching for names in electronic lists. Some examples of lists that are of interest to us are:

Names mentioned in reference books on Ashkenazic surnames by Alexander Beider and Lars Menk, published by Avotaynu Inc. (1993 and 2008, respectively)
Names present in sources related to the Holocaust, such as the Yad Vashem list of names, necrologies from various memorial (yizkor) books, lists of inhabitants of various ghettos, prisoners of concentration camps such as Dachau etc.
Names appearing on Ellis Island passenger lists
Names extracted from the Polish or Russian civil rec-ords and indexed by the JRI-Poland project
Names used by Jews in Argentina

The phonetic values (exact, approximate, Hebrew) of the name being searched for need to be generated by the phonetic engine at the time the search is performed. Prior to doing any searches, the phonetic value of each of the names in the list needs to be calculated. Some simplifications can be used when processing the entire list of names, because there might be information known about the language and the spellings used within the list.

For example, in reference books on Galician and German Jewish surnames, the orthography of all names conforms to the German spelling. As a result, during Steps 2 and 3, every name is processed by the set of rules specific to the German language. The case of Jewish names from Argentina is more ambiguous: some names are spelled in Spanish, others in German, Romanian, or Polish. Even in this situation, the processing is simplified, because we know that such languages as Hungarian, French, or English are irrelevant and, as a result, numerous rules used during Steps 2 and 3 (those restrictedan to these languages) can be ignored.

The matching of an individual name to names present in specific electronic lists proceeds in the following way:

If one of the exact phonetic values of this name and a name from the list are identical, we say that the match is exact. These two names are phonetically equivalent.
If one of the approximate phonetic values of this name and a name from the list are identical, we say that the match is approximate. These two names can be (or not be) phonetically equivalent.
If one of the Hebrew phonetic values of this name and a name from the list are identical, we say that the match is Hebrew. These two names can be phonetically equivalent only if at least one of them was originally spelled in Hebrew. If the user knows that neither of them was spelled in Hebrew or results from the transliteration from Hebrew, the Hebrew match is of no importance and can be simply ignored.

Matches done by BMPM are not necessary commutative, i.e., if a surname A matches a surname B, this does not imply that the surname B will match the surname A. For example, the list of surnames in A Dictionary of Jewish Surnames from the Kingdom of Poland includes the names Bak and Bąk. If a user enters the name Bak, he will get Bąk among the approximate matches, but if he searches Bąk, he will not find Bak. The absence of commutativity here is due to the distinct processing by the system of (1) a name entered by a user, and (2) a name appearing in a reference book. For (1), our algorithm takes into account the possibility that some diacritic sign could be omitted by the user. As a result, a could actually correspond to the original ą. For (2), the exact phonetic value of all letters is known without any ambiguity: when compiling the dictionary of Polish Jewish names, its author never relied upon Polish sources only, but double-checked names in numerous Russian, Yiddish, and Hebrew documents in which the purely Polish confusion between a and ą is impossible. For example, surnames spelled Bak and Bąk in Polish appear in Russian as Бак [Bak] and Бонк [Bonk], respectively. All names whose phonetics were doubtful were not included in the book. As a result, when present in the dictionary in question, the appellation Bak is really Bak and it can not correspond to the original Bąk.

Implementation Issues

The result generated by the steps above is a set of one or more sequences of phonetic characters. Computers, however, are much more efficient at matching numerical values from some small space than in matching arbitrary character strings. For this reason, the following additional steps are performed on the phonetic values before matching is attempted:

Each phonetic character is assigned a digit so that a sequence of phonetic characters can be replaced by a numeric value. This numeric value can be quite large, depending on the number of phonetic sounds in the name being encoded.
The resulting number is reduced to a small number by taking it modulo some base value. This has the disadvantage that two names that are unrelated phonetically can wind up with the same numeric value. Although this is possible, the likelihood of it happening is small, especially if the base value is carefully chosen. For example, that number should not be a multiple of ten, because then only the trailing phonetic characters would be represented, and the leading ones would have no effect on the result.

It should be noted that all the sounds in the name contribute to the BMPM phonetic value and subsequently to the resulting numeric value. This is in contrast to soundex methods in which (1) some sounds such as vowels do not contribute, and (2) the later letters in a name have no bearing on the resulting code value, since the codes truncate after four consonants in American Soundex and six in Daitch Mokotoff Soundex.

Comparison to Daitch-Mokotoff Soundex

Soundex is one of the solutions proposed in the past to solve the problems of name matching. It has several variants of which the Daitch-Mokotoff (DM) method is the one that is the most commonly used in the domain of Jewish Ashkenazic genealogy.

When soundexing, any letter either receives a numerical value or is simply omitted. Different consonants can receive the same numerical values, for example, b and v, m and n, g and k. All vowels are treated as interchangeable. As a result, contrary to BMPM, soundexing does not search for the equivalence of sounds: even different (but sometimes close) sounds can match. Consequently, when matching names, soundexing may have a significantly larger number of false positives than BMPM. On the other hand, it can find some true matches that are not found by BMPM, because the equivalence is not purely phonetic.

The domain in which soundex seems to be more appropriate than BMPM is when the original form of the name (which is the form as it appears in the list) is not known and all that is known is the form of the name used today. Here are some examples:

Various names starting with Silver—such as Silverberg, Silverstein. Here, Silver came from the original German Silber (or Yiddish zilber). The change is not just phonetic, it is partly semantic—the German/Yiddish word for silver is replaced with its English equivalent.

Names having English stone instead of German stein (Yiddish shteyn)—such as Rotstone instead of Rotstein. The DM value for both of them is the same, although the pronunciation of these two words is significantly different. (The situation is different in the case of green for grün and field for feld; they do match in BMPM too because here the match is phonetic as well).

Tartatski/Tartatzki/Tartacki becoming Tartaski in the U.S. Here we are dealing with Anglicizing—the consonantal cluster tsk never occurs in English, whereas sk is commonly used. Again, phonetically speaking, Tartatski and Tartaski are not equivalent, and for that reason BMPM does not consider them as matches.

In the examples above, DM Soundex can find some Anglicized fits for the following reasons:

Adaptation of sounds from one language to another often changes them to sounds that are different, but still close (and consequently their DM code can be identical).
English is a Germanic language, that is, from the same linguistic group as German and Yiddish. That means that semantic adaptations of Ashkenazic surnames (like Silber to Silver) can produce forms that are close both phonetically and semantically.
DM-Soundex codes include only six digits. So forms shortened by immigrants to a name that contains fewer than seven consonants (or consonant clusters) can match under DM Soundex. BMPM values are based on the entire name, no matter how long it is. For example, both Konstantinovsky and Constantine have the same DM Soundex code but not the same BMPM values.

On the other hand, here are some cases for which neither DM Soundex, nor BMPM will find matches:

Numerous names ending in ovsky/ovski/owski for which their ending was Anglicized to osky/oski
All translations to words sounding different such as Schwarz to Black, and Adler to Eagle
All shortened forms that include more than six consonants.

Hebraicized names will rarely give matches by DM Soundex, because Hebrew is a Semitic language, not from the same family as German/Yiddish/Slavic languages. Moreover, often the Hebraicizing involves some shortening and/or change of letters, which will present problems for BMPM as well. Examples are Perski to Peres, Rabichev to Rabin, Scheinerman to Sharon, Gryn to Ben Gurion, Meyerson to Meir, Shertok to Sharett, Shkolnik to Eshkol, Brog to Barak, not to mention Ezernitsky (Jeziernicki) to Shamir, and Mileykovsky (Milejkowski) to Netaniahu.

Summarizing the above, DM Soundex is more appropriate than BMPM for individual searches made by descendants of immigrants to North America or England who know the names of their ancestors in their Anglicized form only. In that case, the disadvantage of the large number of false positives is outweighed by the advantage of finding some Anglicized forms that would otherwise not be found.

DM Soundex is also more appropriate in cases in which a matching should be done between two lists of names, one of which deals with original name and the other with the Anglicized versions. For example, someone may be searching for matches between names in the Ellis Island passenger records (which contain the original European names) and the U.S. census records (in which names have already been anglicized).

In other contexts, BMPM is more appropriate than DM. These include:

Automatic processing by computer of large databases in order to find matches between elements of various data- bases. This was the primary objective that led to the conception of BMPM. If DM Soundex were used in this context, the computer would not be able to weed out the large number of false positives that would be generated.
Searching for individual original names (names used before immigration and not yet Anglicized) in large databases. If we want to quickly find matches between two spellings both of which correspond to the European forms, BMPM will immediately provide the list of fits. In this case, the main advantage of DM (finding of some Anglicized forms) is irrelevant. As a result, if someone knows roughly what the original name of interest was, BMPM will be more appropriate, because it will immediately cover the identicalness of numerous variant spellings of Schwartz (given at the beginning of this article) without polluting the list with the presence of numerous false positives.

A group of matches found by BMPM exists that are not found by the current version of DM Soundex. Below are several examples, along with the reason why they do match in BMPM:

Triphthongs are approximately equivalent to diphthongs: Altmayr matches to Altmayer, Heym to Heyem, Kajm to Kaiem.
Forms with h between vowels or at the beginning of the word are approximately equivalent to those in which h was lost: Johanes and Joanes, Halperin and Alperin.
The letter combinations inm and jnm are approximately equivalent to im and jm: Weinman(n) and Weiman(n), Fajnman and Fajman.
sc before a vowel is not equivalent to s or sch; it can be exactly equivalent to sk: Boscowitz and Boskowitz, Muscat and Muskat.
When one sound expressed in our conventional phonetic alphabet by the signs S (English sh), Z (French j), s and z is followed by another sound from the same group, it can be dropped (due to the phenomenon of the regressive assimilation, discussed above in this article). As a result, the following names match exactly: Hirschstein and Hirstein, Ovruchsky and Ovrutsky.
The sound d disappears if it is followed by the sound t or a diphthong that starts with t (such as that expressed by ch as in English check). Consequently the following match exactly: Gladtke and Glatcke, Goldzweig and Golzweig, Kurlandchik and Kurlanchik.
Several transliterations into English of Cyrillic vowels followed by e are exactly equivalent: ae, aye, aie and aje (all for Cyrillic ae); oe, oye, oie and oje (all for Cyrillic oe) etc. Examples: Faer, Fajer, Faier and Fayer (Cyrillic Фаер), Meer, Mejer, Meier and Meyer (Cyrillic Меер). In DM Soundex the forms with ae, oe, ee do not match to aye-aie-aje, oye-oie-oje, eye-eie-eje, respectively.
Initial Rh is exactly equivalent to R: Rhau and Rau, Rhein and Rain.

Some of these drawbacks of the DM Soundex can be eliminated easily by introducing new rules (for example, the last one). For others, the logic of the DM-Soundex prevents such pairs from matching.

The above arguments show that globally speaking BMPM and DM are complementary tools: each has contexts in which its application is more appropriate than that of the other method.

Conclusion

The BMPM system has now been incorporated into several search applications on Stephen Morse’s One-Step website. They can all be found at <http://stevemorse.org>. The specific applications and the sections they can be found in are:

Searching for Ellis Island Passengers. See the link titled “Ellis Island Gold Form” in the Ellis Island section.
Searching the Dachau Concentration Camp Records. See the link titled “Dachau Concentration Camp” in the Holocaust section.
Searching Naturalization Records. See the link titled “Footnote Naturalization Records” in the Vital Records section.
Searching Reference Books for Jewish Surnames. See the link titled “Reference Books” in the Holocaust/Eastern Europe section.

In addition, several organizations have requested our coding so that they can incorporate phonetic matching in their searches. One of those organizations, JewishGen, announced at the 2008 IAJGS conference that it will be using phonetic matching with its databases.

Note

The initial work on this algorithm was based on the article by Alexander Beider, “Some Issues in Ashkenazic Name Searches” AVOTAYNU, Vol. XXIII, Number 1, Spring 2007, pp. 3–13 and the long-term desire of Stephen P. Morse to ameliorate the engine of his various online searchable databases <http://stevemorse.org> including Ellis Island Passenger Lists. The initiation of this project (and, more precisely, the personal meeting of its two authors in Newark in July 2007 and their decision to work together) was made possible due to the organizational efforts by Sallyann Amdur Sack and the sponsoring provided by the International Institute for Jewish Genealogy (Jerusalem). The two authors would also like to thank Logan Kleinwaks, Gary Mokotoff, and Jean-Pierre Stroweis, who tested the draft versions of BMPM and provided numerous valuable comments.

Alexander Beider is a linguist and the author of a number of books and papers dealing with the etymology of Ashkenazic surnames and given names, as well as the history of Yiddish. He holds two doctorate degrees—the first in applied mathematics from the Moscow Physico-Technical Institute and the second in Jewish studies from the Sorbonne in Paris. Born in Moscow, he currently lives with his family in Paris.

Stephen P. Morse is an amateur genealogist and the creator of the One-Step Website, which provides utilities that assist in searching databases. He has received numerous awards for his work including a Lifetime Achievement award. In his other life, he is a computer professional with a doctorate degree in electrical engineering. He is best known as the architect of the Intel 8086, which sparked the PC revolution 25 years ago.

Share This

Related posts:

About Alexander Beider

About Stephen P. Morse