During the symposium that inaugurated the International Institute of Jewish Genealogy in September 2006 in Jerusalem, one lecture particularly attracted my attention, that given by Professor H. Daniel Wagner of the Weizmann Institute of Science. During his presentation, Wagner enumerated several limitations of standard soundex systems. During that lecture, I realized that the problems evoked were also coming to my mind in the past few years, although I had never thought about them in a systematic way. The present article attempts to find answers to the primary questions posited. Wagner spoke about the absence of coverage for characters from alphabets other than the Roman alphabet, the fact that the most commonly used soundex system (Daitch-Mokotoff) can assign the same numerical codes for surnames that in no case can be variants of each other, and the fact that the soundex approach does not cover graphical errors that often occur in various genealogical sources cited by Professor Wagner. It deals with Jewish surnames from Eastern Europe only, although a similar approach can be applied on a much more general level.
Soundex
In Jewish genealogy, when one wants to search for a name in a large list in which the name may be spelled in different ways, one often uses the Daitch-Mokotoff (D-M) soundex system. The D-M system allows us to transform any word into a numerical code and then to consult names that have the same code from the list in question. In this way, correspondence may be found between surnames whose spelling is different; for example, Smith, Szmid, Schmidt and Shmit. This feature is particularly important when dealing with Jewish names that originated in Eastern Europe. Indeed, migrants from that area could use one spelling in their country of origin and another spelling in their new countries. This situation is plausible for several reasons. First, surnames from the Russian Empire, originally spelled using the Cyrillic alphabet, were spelled in Latin characters in the Americas, Western Europe, Australia and South Africa. Second, even within the same country, numerous spelling variations existed for the same family names. The name of the same Polish Jew might appear in Polish records as Grynsztajn, Grinsztajn, Grünstein, Grynstein, Grynstain, Grynsztayn, Grinszteyn, Grinstain and other variants. In the Daitch-Mokotoff system, all these spellings correspond to the same code: 596436.
The usefulness of the D-M system is well recognized, and it is not surprising that such major online databases as those of JewishGen (www.jewishgen.org) and Ellis Island Jewish Passenger Lists (www.stevenmorse.org) allow users to search based on it.
Currently, the Daitch-Mokotoff soundex system covers only Latin characters. It would be easy, however, to define the rules of its application for use with other alphabets (Hebrew, Cyrillic, Arabic) as well. The primarily limitation of the D-M system lies in the fact that many unrelated surnames that never could represent alternate spellings of the same name receive the same numerical value. As a result, when searching for a specific name, the researcher usually must search through a large set of names that include numerous irrelevant items. For individual searches, this feature is minor, albeit annoying at times.
On the other hand, the inclusion of numerous unrelated names within the same D-M code precludes use of this method for the automatic treatment of large lists of names for whose elements one wants to discover equivalents in another list. One cannot, for example, use the Daitch-Mokotoff Soundex to make an automatic match between surnames included in the Ellis Island Data Base and those that appear in lists of Holocaust victims. To illustrate the problem, consider several pairs of Polish Jewish surnames in which the numerical value of both elements is identical, while the surnames are obviously independent: Baj and Fau (700000), Żaba and Sowa (470000), Ptakowicz and Witkiewicz (735740), Agnes and Eugieniusz (056400), Akcyg [German Acktzig] and Ochocki (054500), Augienfisz [Augenfisch] and Okuniewicz (056740), Drzewienko and Szybniak (476500), Szajnwar [Scheinwahr] and Sznaper [Schnaper] (467900).
Amending Vowel Coding. To ameliorate the situation, several amendments might be made to the system. The first concerns the processing of vowels. In the D-M system, vowels are ignored when calculating the numerical soundex code. The only exceptions are:
- Combinations of three vowels that are coded as 1
- Vowels at the beginning of a word, coded as 0
With this approach, the absence of any vowel and the presence of an internal diphthong (a sound produced by two adjacent vowels) receive the same treatment; they both are ignored. This result seems inappropriate. In some situations, a diphthong can be shifted phonetically to a monophthong (a sound produced by only one vowel), or may be written using just one vowel, as in English wine, where i sounds the same as the diphthong in the English word “eye.” It cannot, however, be reduced to a zero sound. To repair the situation, one might use the following rules:
- Combinations of two or more vowels are coded 1. The only exceptions are those that consist of two vowels in which the first one is i or y such as ie, ia and yo. (In several languages these combinations actually correspond to monophthongs. On the other hand, combinations such as ai, aj, au, ay, ei, ej, eu, ey, oi, oj, oy, uy and others generally denote diphthongs.1)
- Any single vowel, as well as the combinations that apply to the previous rule (that is, those starting with i or y), are given one of two coding variants; they are just ignored or are given a 1.2
After applying this amendment, Szajnwar (with the diphthong aj) corresponds to 416790 and 416719, while Sznaper is coded 467900, 461790, 467190 and 461719. That is, these two surnames are now separated; they receive different codes.
Amending Consonant Coding. A second possible modification concerns the consonants. In the Daitch-Mokotoff system, consonants are classified into several groups corresponding to different numbers. For example, the following sounds are coded 5: g, k (a sound that may be expressed using either the letters k or c), h and kh (a sound most often spelled ch, as in Chaim). The use of the same code for these four different sounds has a phonetic basis. Indeed, the following pairs can merge:
- G and k. In numerous languages (including German, Polish and East Slavic tongues) g loses its sound and turns to k at the end of words and before an unvoiced consonant, while k can turn into g before a voiced consonant such as, for example, b, d or z.
- G and h. Russian has no h and regularly replaces it with g. In Belarusian, the situation is opposite
- H and kh. Polish has no h. The pronunciation of h and ch is identical and corresponds to kh.
On the other hand, g and kh never merge; neither do k and h. Similar pairs that never are interchangeable may be found for other consonant codes as well. These include:
- D-M code 4: Sh (expressed by the English sh, German sch, Polish sz) and z; zh (expressed in Polish as ż , zi, rz) and s; ch (expressed in English as ch, German tsch, Polish cz) and s.
- D-M code 7: b and f; p and v (expressed by the English v, German and Polish w).
A computer program can easily take into account the fact that the above letters or letter combinations are not interchangeable, and if someone searches for the soundex equivalents of a name of interest, the number of suggested possibilities can be reduced. This approach for example, permits separating Baj from Fau, Ptakowicz from Witkiewicz, Żaba from Sowa, thanks to the letters in boldface.
Sorting by Probability. Surnames with the same soundex value as the name X can be sorted according to the logical probability of their being actual variants of X. To evaluate this probability, one should take into account the following elements that allow us to discern what might be called “indicators of non-equivalence” of names:
- The environment in which the letters with the same soundex code appear. For example, d and t are really interchangeable in many languages in a word’s final position and before consonants. In some cases, the change from one to another can also occur in a word’s first letter. On the other hand, d and t are unlikely to be equivalent when placed between vowels. A similar rule applies to g and k, v and f, b and p, z and s. As a result, Barshab and Barshap are much more plausible equivalents than, say, Kaban and Kapan.
- If, in two surnames, a vowel is present in the same place or no vowel appears there, then these names are better candidates to represent variants of the same name than if one has a vowel and another lacks any vowel in the same position. For example, Barak and Borak are more plausible equivalents than Barak and Brak.
- If two surnames have different vowels in the same position, then the probability that they are variants of the same name depend on these vowels. The following, for example, are easily interchangeable: a and o; i, y and j; ü, ue, ie and i; ä, ae and e; ö, oe and e. On the other hand, pairs like a and u; a and i are significantly less likely to be equivalents.
- The larger the number of “indicators of non-equivalence” in a pair of surnames, the smaller the chance that they really are related. One may say categorically that if even one indicator of this kind occurs, then the odds are high that the names are independent.
- Finally, the presence in two surnames of the same letter in the same position increases the likelihood that they are linked more than does the presence of two different letters (even if these letters have the same soundex values).
Although the above suggestions are primarily intended for software engineers who prepare various online searchable databases, they also may be helpful to individual genealogists using databases or books that include lists of names sorted by soundex.
Transliteration
For the past three years, I have worked on the second edition of A Dictionary of Jewish Surnames from the Russian Empire (DJSRE) [in print, Avotaynu, 2007]. Thanks to the kind collaboration of many people and associations, as well as online access to various Internet databases, I gained access to large lists of surnames from the area in question that increased the dictionary portion of the book to include 40 percent more names, an increase from 50,000 to 70,000 entries. In addition, the location section showing the districts where names were found before 1917 of numerous entries mentioned in the first edition (1993), was enlarged dramatically. Often I was dealing with huge lists taken from new sources—for example, more than one million records from the Yad Vashem online database of names of Holocaust victims, or extracts from the approximately 400,000 records of the All Lithuania database prepared by members of the Litvak Special Interest Group for Jewish Genealogy.
One of the most important tasks consisted of distinguishing names from a new source that might really be new from names already present in the 1993 edition that might be spelled differently. The Daitch-Mokotoff soundex system could not be used for this task, because, as explained above, numerous unrelated surnames have the same numerical codes and, as a result, no automatic match can be done. To resolve the problem, the primary approach was to analyze by using transliterations. In secondary school, we all learned how to compare fractions: one needs to find their common denominator. The transliteration method is somewhat similar. Before deciding whether two spellings are variants of the same name or not, one transforms them to the forms spelled according to the orthographic rules of the same language. For example, the equivalence of German Schwarz and Polish Szwarc can be easily established if
- the German spelling is transliterated to Polish, or
- the Polish spelling is transliterated to German, or
- both are transliterated to a third language, English, yielding Shvarts.
In the Russian Empire, surnames originally were written in Cyrillic characters.3 Nevertheless, in almost all major sources used in the preparation of the present second edition, surnames are spelled using Roman alphabet letters. Many actually are the result of transliteration of the original Russian (Cyrillic) spelling. The transliteration methods used by various researchers are more or less traditional, but all differ from the method used in DJSRE, invented especially for that book and not reused by other authors. One major principle guided conceptualization of the transliteration method used in DJSRE: the necessity to be able to restore the original Cyrillic spelling without any ambiguity. Other methods generally do not meet this constraint.
The major differences between the method used in DJSRE and all the other traditional ones concern t_s for the Cyrillic тс , è for э and j for й . More traditional rendering of the same characters are ts/tz, e and y/i, respectively. In addition, DJSRE uses the apostrophe sign to transliterate ь, the Russian letter called the “soft sign.” The soft sign often is ignored in transliterations.
Recent Yad Vashem Testimonies
Numerous Pages of Testimony at Yad Vashem, compiled since the beginning of the 1990s by former Soviet citizens, are written in Russian. Since the work on the Yad Vashem searchable Internet database was centralized and the rules for its preparation were defined clearly, names appearing on these Pages of Testimony have been transcribed into Latin characters following the same method of transliteration from Russian to English.4 Analysis of this method easily established a table of direct correspondence between letter and letter combinations that appear in Yad Vashem’s records and those used in DJSRE. These are the following:
$ Tz= ts; final i= ij. For example, all surnames that end in ski in Yad Vashem’s database are spelled skij in DJSRE
$ Final y= yj; iy = ij; ei= ej; ai= aj; oi =oj; ui=uj; ia=ya; iu= yu; initial Yo=Io; non-initial yo= e (corresponds to Russian ë); initial E=È; ye=e
In the above pairs, the first element corresponds to Yad Vashem’s usage; the second one reflects usage in DJSRE. In addition to the application of these rules, the feminine forms of surnames that frequently appear in the Yad Vashem database automatically were replaced by the corresponding masculine forms to allow for the DJSRE convention of presenting only the masculine forms. This mainly consisted in the search and replace of the following final elements: ova to ov; eva to ev; ina to in; skaya and skaia to skij
Extracts from Russian Genealogical Sources
The situation is different with extracts from various Russian revision, taxation and family lists, civil records and official Russian newspapers. Archivists from Eastern Europe compiled these extracts primarily for Jewish genealogists, although some were compiled directly by Jewish genealogists themselves. Often this work was not centralized; as a result, various individuals used different transliteration methods. Moreover, in some cases, one can see that the chosen methods were not followed as rigorously as those at Yad Vashem. The same archivists and especially lay genealogists sometimes transliterated the same Cyrillic characters in different ways. Following is one approach to transform these spellings to those used in DJSRE.
First, apply a series of additional rules (the second elements of the pairs below correspond to DJSRE):
- Ay=aj; ey=ej; oy=oj; uy=uj; iy=ij. The elements cited in the first position of these pairs correspond to the transliteration from Russian, different from that used in both DJSRE and Yad Vashem’s online database. For example, the original Cyrillic Бейзер can be transliterated not only as Bejzer (DJSRE) or Beizer (Yad Vashem), but also Beyzer.5
- W = v; ck = k; for ch try both kh and ch; tsch = ch; sch = sh; for s try both s and z; for z try both z and ts; final sky = skij; for the final witz try both vich and vits; for ei try both ej and aj; for i try both i and y. The elements cited in the first position of these pairs represent German transliteration from Russian. An old tradition applies German spelling when transliterating Cyrillic letters to the Latin alphabet,6 especially when the surname possesses explicit German elements and, therefore, is of Yiddish or German origin. For example, the Cyrillic Шустер may be transliterated as Shuster (English; as in DJSRE and Yad Vashem) or Schuster (German). Sometimes Germanized forms can be found even in surnames with typical Slavic suffixes. Examples are Abramowitz and Berkowitz instead of the English Abramovich and Berkovich for the original Russian Абрамович and Беркович, respectively.
- H (when not in the combination ch, kh, sh and zh)= g; chs =ks; eu =ej; v=f; for st try st and sht; for sp try sp and shp; for any double consonants also try its single variant (for example, for nn try both n and nn, for ss try both s and ss); ü= i; ö=e; ä= e; ie= i. The first element in these pairs corresponds to the Germanized forms of the original Russian spelling. For example, Fuchs, Horn, Münz, Neuman(n) and Spielberg instead of the original Russian Фукс, Горн, Минц, Нейман and Шпильберг respectively. The substitution of h for ( (g) and eu for ,6 (ej), as well as the doubling of consonants, is found when the researcher who prepares the extracts considers that he or she understands from what German word(s) the surname is derived and—when passing from Cyrillic to Latin characters—decides to fit it to the spelling of the conjectured German source word(s). Consequently, the resulting forms are already beyond the frames of pure transliteration.
Second, apply the same correspondences as those suggested above for Yad Vashem’s Russian testimony pages. Unlike the sources discussed above, numerous extracts from the Russian language civil records of Białystok are neither English nor German, but are written according to the rules of Polish orthography. These spellings correspond primarily to plain transliterations.7 Nevertheless, sometimes the spelling is chosen to fit that of an existing Polish word. For example, Врубель (Vrubel’) mainly appears as Wróbel, while the direct transliteration would give Wrubel. Some occupational names ending in Russian in арь (ar’), ар (ar) or арж (arzh) appear in Polish with arz.
Sources from North America
The online Ellis Island passenger lists (EIDB) currently represent one of the most frequently used sources in Jewish genealogy. All surnames appear there in Latin characters. Analysis of the spelling of the names of Eastern European emigrants yields several conclusions. First, the transliteration from Cyrillic to Latin characters clearly was done not in the United States, but in Europe. The spellings used are based almost exclusively on the German, Polish and Romanian languages, not on English.8 Before the fall of the Russian Empire, the German spelling was by far dominant.9 To reestablish the original Russian spelling, one can follow the instructions presented in the previous section. The instructions are not exhaustive, however, and should be supplemented by a few others that reflect the possibility of the Germanizing of Slavic forms. For sch, try both sh and zh; for any tz (and not only in the final witz), try both ts and ch; for a final off, try both ov and of; for a final e, try e, a and o. Examples are Abramoff for Абрамов (Abramov), Marchatz for Мархач (Markhach), Schuck for Жук (Zhuk). The German spelling also often devoices consonants before other voiceless consonants: Rapkowitz for Рабкович (Rabkovich), Borofsky for Боровский (Borovskij). The doubling of consonants, often without any morphologic (that is, in the source words for these surnames the consonants are single) reason, frequently may be found in German spelling as in Blitt instead of Blit and Barr instead of Bar.
Latvian and Lithuanian Sources
During the interwar period, after Latvia and Lithuania became independent, Jewish surnames from that area were respelled according to the grammar of these Baltic languages. In the nominative case, Latvian spelling includes a final s for men and an e for women, omits the consonant h present, but not pronounced, in German forms, simplifies double consonants to single ones and has a number of letters pronounced differently in Latvian than in German. For example, for the German (t)z, sch, tsch, w and ie the corresponding Latvian letters are c, š, č, v and i, respectively. Table 1 illustrates the changes of spelling that occurred in Courland’s Jewish surnames.
In Lithuanian, all surnames also receive special endings. In the nominative case, the masculine form of each surname ends in is, ys, as, us, a, or e depending on the root surname. For example, if the non-Lithuanian form ends in a consonant, then its corresponding Lithuanian form will necessarily end in s. In other grammatical cases, these endings are replaced by other specific endings. These rules are general: they are applied automatically not only for persons living in Lithuania, but even when quoting surnames of foreigners. The Lithuanian spelling also possesses a number of specific characters. For example, Lithuanian č, š, and ž correspond to the sounds expressed by the Russian ч [ch], ш [sh], ж [zh], respectively (exactly as in Latvian). Table 2 presents a sample of the Lithuanian spellings of some Jewish surnames.
Table 2. Correspondences Between
Russian and Lithuanian Spellings
Russian spelling | Lithuanian spelling |
Альперн [Al’pern] | Alpernas |
Витенберг [Vitenberg] | Vitenbergis |
Генс [Gens] | Gensas |
Зингер [Zinger] | Zingeris |
Канович [Kanovich] | Kanovičius |
Левин [Levin] | Levinas |
Слуцкий [Slutskij] | Sluckis |
Шустер [Shuster] | Šusteris |
Эсс [Èss] | Essas |
Feminine forms of Lithuanian surnames are different from the masculine ones. In the nominative case, married women use surnames that consist of the stem of the husband’s surname (that is, without as, is, ys, us), plus the ending enė. Unmarried girls and women have surnames that end in aitė, itė or ytė. Consequently, members of the same family may bear surnames that seem different to a non-Lithuanian, for example, the father Levinas, the mother Levinenė and the daughter Levinaitė. In Russian, the same name, the most common Lithuanian Jewish surname, was spelled Левин (Levin) for men and Левина (Levina) for women.
The information above easily permits one to deduce the original Russian spelling from Latvian or Lithuanian names.
Other Transliterations
Thus far, we have discussed only the transliterations that seek to re-establish the original Russian spelling. In Jewish onomastics and genealogy, one often deals with other types of transliterations. For example, when Russian became the official language of Congress Poland10 in 1868, all surnames of Polish Jews were transliterated from Latin to Cyrillic characters. Fifty years later, after the re-establishment of the independent Polish state (1919), the inverse process took place: names were changed from Cyrillic to Latin letters. On the other hand, even before 1868, in numerous cases, the same Polish-Jewish family names could be written according to different spellings: Polish, German or blended Polish-German.
Similar orthographic changes occurred in Galicia. Until the end of the Austro-Hungarian Empire in 1918, German spelling was dominant in that province, although Polish and blended forms sometimes were used as well. In the interwar period, however, a number of names were transliterated from German to Polish.
During the 1890s, the surnames of all inhabitants of Courland were transliterated from German to Russian. In 1919, Bessarabia became part of Romania, and consequently, all local surnames were transliterated from Russian to Romanian, the latter using the Roman alphabet.
Table 3 summarizes the main correspondences between Russian, English, Polish, German and Romanian spellings.
The difference between double and single letters should be ignored; nn is comparable to n and vice versa.
Hungarian, Spanish and French Names
In addition to the languages discussed above, three others deserve special mention in the context of Ashkenazic names: Hungarian, Spanish and French. The role of Hungarian names grew significantly after the mid-19th century when numerous Jews living in the Hungarian part of the Habsburg Empire11 began to change their German-sounding surnames (acquired in the 1780s) to Hungarian ones. Some changed just the spelling in order to fit the rules of Hungarian orthography. Its main peculiarities concern the following letters and letter combinations (English equivalents in parentheses): c (ts), cs (ch), gy (palatalized /d/), j and ly (as y in boy), s (sh), sz (s), ty (palatalized /t/), zs (zh). Some examples of names transliterated to Hungarian are: Grosz from the German Gross, Kis from the German Kisch, Polacsek from the Czech Poláček or the Polish Polaczek.
Spanish spelling is important when dealing with immigrants to Latin America, especially to Argentina. Its most specific elements are: c (as s before e and i; as k elsewhere); ch (as English ch, sometimes also pronounced dzh like j in jump); g (before e) and j (both like ch in German or Polish); ñ for soft n; ll sometimes used for /zh/; gue for ge; qu for k; confusion between b and v (as b); confusion between n and m before consonants; both s and z for s. For these reasons, one can find such Argentinean Jewish surnames as Aberbuj, Bedñak, Guerscovich, Marquenson, Rosemberg and Sapollnik that correspond to the Russian Jewish Averbukh, Bednyak, Gershkovich, Markenzon, Rozenberg and Sapozhnik, respectively.
When many thousands of Jews from Poland and Romania moved to France and Belgium in the interwar period, some of them changed the spelling of their surnames according to the rules of French orthography. In many cases, however, they retained their original surnames because they already were spelled in Roman alphabet characters. This condition was not true for several thousand Russian Jews who emigrated to France during the first quarter of the 20th century. As a result, one can find in modern Ashkenazic surnames such typical French elements as (English equivalents in parentheses): c before e, i or y (s); c in other positions (k); ch (sh); gne (palatalized n, Russian нь); gu before e, i or y (g); final ine instead of in because of the nasal character of French in; j (zh); ou (u / oo); qu before e, i or y (k); ss between vowels (s); tch (ch). Among the examples of French spelling of Russian Jewish surnames are (DJSRE transliteration in parentheses): Chagall (Shagal), Kikoïne (Kikoin), Krémègne (Kremen’), Kouchner (Kushner), Levine (Levin), Soutine (Sutin) and Vichniac (Vishnyak).
General Recommendations
The information presented here allows one to construct an automatic method of searching for surname equivalence that is an alternative to any soundex system. Consider the following situation. Someone is interested in finding equivalents to a surname that is written in language X. In the Internet list in which the search is being done, surnames are spelled according to the orthography rules of language Y. In this case, the software proposed by the webmaster should have the following features (generally not yet found):
- The user should have the possibility of entering not only his/her surname of interest, but also the value of X, that is, the language whose orthography the spelling follows, this language chosen from a predefined list.
- When preparing the database for online searches, the spelling of any surname in the Internet list in question should be linked, if possible, to some particular language Y. In many cases, such a link can indeed be established. For example, as discussed above, the spellings of the names of Russian-Jewish immigrants in the Ellis Island Passenger Lists prior to World War 1 are primarily German, while those used in Yad Vashem’s database for Pages of Testimony compiled by former Soviet citizens generally are spelled in Latin characters according to a transliteration method from Russian to English.
- If the user does not specify language X, the system should try to identify it automatically. Some letters or letter combinations can be helpful: ă, â, ghe, î, ş and ţ are all specific to Romanian; sche and schi are either German or Romanian; sch (in other positions) and mann are typically German; ą, ć, cz, ę, ł, ń, ó, ś, ź, and ż are typically Polish; sz is either Polish or Hungarian; gue is either Spanish or French; v cannot be Polish, while w cannot be French, Romanian or Spanish.
- Taking into account this information as well as the table of correspondences presented above (Table 3), the program suggests possible equivalents. In most cases, the resulting list of potential equivalents will be smaller than that suggested by using any soundex system and will include only surnames whose kinship is plausible.
No sophisticated software is mandatory for the application of the transliteration rules discussed here. During the preparation of the second edition of DJSRE, I performed numerous comparisons between new source lists and that of surnames already present in the database compiled for this second edition primarily by using just Microsoft Excel in which I used a series of consecutive search and replace procedures in conformity with Table 3. Any genealogist, on an individual level, also can apply the same rules to establish the equivalence of surnames of interest.
Sources Written in Hebrew Characters
When the original source is written in Hebrew, the researcher faces a true puzzle. This results from ambiguity in the pronunciation of a few letters: ב can be read as b or v, פ as p or f, כ as k or kh, ת as t or, for Ashkenazic Jews only, as s, ש as sh or s, ו stands for o, u, v, oy, au and ou. Also contributing to ambiguity of pronunciation is the absence of any sign for a and e. As a result, the Hebrew spelling of Jewish surnames from Eastern Europe cannot really be considered a transliteration. It looks more like encoding, with the same code corresponding to several sources. For example, Berg, Bereg, Barg, Brag and Barag all are written ברג, while Fajn (Fayn), Fain, Fejn (Feyn), Fin, Finn, Pain and Pajn (Payn) all correspond to פין.
For this reason, contrary to all cases discussed in previous sections, a Latin letter transcription of the Hebrew spelling generally is not a transliteration either. It also may incorporate a hypothesis (whether valid or false) of the author of the transcription about the way the original name was pronounced. Such is the case, for example, with numerous Yad Vashem records that result from Pages of Testimony compiled during the 1950s. Detailed analysis shows that, in many cases, the Latin character spelling of names was not provided at the time the Page of Testimony originally was filled out. Often, it was added later by Yad Vashem employees and, as a result, just represents their educated guess.
For someone who wants to establish the exact pronunciation of a name in this situation, the only appropriate way is to look at the picture of the original Page of Testimony, generally supplied on the Yad Vashem website. In rare cases, this allows one to find the Hebrew spellings using diacritical signs for vowels. Very few Pages show the original Russian spelling provided by the submitter. On other Pages of Testimony, one may observe that the handwriting and the ink used when writing in Latin characters seem to be the same as those of the Hebrew spelling. In these cases, the spelling in Latin characters may be reliable since the submitter entered it. Often, however, consulting the picture of the original Page of Testimony gives no positive results. One sees letters from the Roman alphabet inscribed on the Page of Testimony by someone other than the original submitter. Consequently, this transcription should be taken with great caution.
When dealing with Yad Vashem Pages of Testimony from the 1950s, one also should bear in mind that the Latin spelling often is based upon the language that was official in the area in question immediately before World War II. For example, surnames from the areas that during the time of the Russian Empire belonged to Grodno guberniya, western Volhynia and Minsk guberniyas and southern Vilna guberniya primarily are spelled in Polish. Those from Bessarabia follow the rules of Romanian orthography.
The difficulties of the Hebrew spelling described above precluded the inclusion of several hundred additional names in the second edition of DJSRE because the spelling used on the Yad Vashem Pages of Testimony cannot be used to establish the original Russian spelling without ambiguity. On the other hand, these difficulties do not prevent researchers trying to establish equivalence between individual names of interest and those present in some Hebrew list. Indeed, it suffices to take the Hebrew spelling, allow for the rules listed at the beginning of this section (the interchangeability of b/v, p/f and many vowels) and this way automatically deduce a list of all elements of that Hebrew list that, in principle, could be possible equivalents of the surname in question.
Some memorial (yizkor) books dealing with communities from Eastern Europe use Yiddish spelling. In these sources, identification of the original Cyrillic or Roman alphabet spelling is much easier. Yiddish spelling possesses graphic signs for all vowels. The ambiguities arise only if the source does not use diacritics and, as a result, the following sounds become identical: א (a) and אָ (o), פ (p) and (f), יי (ey) and ײַ (ay).
Typographic Errors and Other Misinterpretations
The phonetic ambiguities of some letters discussed above do not cover all identification issues. In many cases, when dealing with printed or online sources, one faces secondary (or even tertiary) spellings that result from misinterpretation of the primary source. These mistakes can be classified into two broad categories.
The first category encompasses all typographic errors. When a researcher prepares large extracts from a source, he/she may unintentionally introduce some incorrect characters. Statistically speaking, one most often finds either an inversion of the order of two letters or the typing of a character that is next to the correct one on the keyboard. Yad Vashem’s Pages of Testimony are among the sources where such errors are almost non-existent. Generally, such sources are filled in by members of the family of the Holocaust victims who know the exact spelling of names; the pages are normally reread by their authors before submitting them.12 As a result, the very few typographical errors on the Yad Vashem website come from typing errors made by the persons who transformed these handwritten materials into the electronic data. They can be eliminated, however, by checking a questionable spelling against the Internet picture of the original Page of Testimony.
The second category covers misinterpretations made by persons who worked with handwritten materials. Such errors may have been present already in the first source made public. For example, when studying voter registrations lists for the Duma (Russian parliament) published in official Russian newspapers (1906, 1907, 1912), I paid attention to a significant number of forms in which some letters were substituted by others, close to them graphically but not phonetically. The presence of such items demonstrates without ambiguity that the persons responsible for the printing based their publication on original handwritten lists in which some characters were not written clearly. Among the letters that look similar in Russian handwriting (therefore are easy to be taken one for another) are:
- Lower case letters: а (a) and о (o); г (g) and ч (ch) ; ш (sh) and м (m); м (m) and т (t); н (n), и (i) and й (j); ф (f) and ер (er); ц (ts) and у (u); ю (yu) and iо (io; used only before 1918, modern иo); ъ (“), ь (‘) and ѣ (e; this letter called yat’ in Russian, was abolished in 1918); к (k) and х (kh)
- Both upper and low case letters: ш (sh) and щ (shch); м (m) and т (t); б (b) and в (v); к (k) and н (n)
- Only upper case letters: Г (G) and T (T)
The above information can be helpful when dealing with extracts made by (or on behalf of) Jewish genealogists from Russian civil records or revision and taxation lists. In rare cases, one also finds these confusions in Yad Vashem’s database. Yad Vashem employees did the Latin respelling of the original Cyrillic form, and in some testimonies the handwriting was ambiguous.13
Among the various available databases of Jewish surnames, the Ellis Island Jewish passenger lists are, unfortunately, particularly affected by misinterpretations. The original Roman alphabet records were done by clerks who often had terrible handwriting.14 To complicate matters, the interpretation of these records dealing with people of various origins coming to U.S. was done by volunteer members of the LDS (Mormon) Church. In many cases, it is clear that these individuals were totally unfamiliar with the world of Germanic and Slavic names. As a result, one finds thousands of incorrect forms.
A number of letters look so similar that they regularly are mistaken for another: n, u and v; e and c; f, t and l; F, T, L and J; h and k; v and r; r, z and s; r and i; m, rn, w, in and ni; W and M; a and o; a and u; B, R and K; g and y. These rules can be easily taken into account by software programs that allow one to search through an online database based on original documents written in Latin letters.
Several additional general rules simplifying this automatic search can be formulated. During the period that Ellis Island was in use (1892–1924), names of immigrants from Eastern Europe were never spelled according to the rules of English orthography. The spelling was mostly German. Beginning in the 1920s, Polish spelling starts to be common too. As a result, typical English forms mainly correspond to misinterpretations. Here are two examples from Grodno guberniya: the surnames appearing as Shipski and Shongin in the database are actually spelled Slupski and Strongin on the original passenger lists.15 The letter v is rare in German and absent from Polish. Consequently, if it is present in a surname, we likely have a misinterpretation. For example, Disvak, Gilvin, Vikel and Vowagruaska actually are spelled Diwak, Gildin, Nickel and Nowogrudska, respectively.16
The above rules, unfortunately, are far from being exhaustive. In numerous cases, the misinterpretation is particularly difficult to detect automatically without looking at the picture of the actual record. As a result of errors, some names become almost totally unrecognizable. The two lists below dealing with surnames of Jews from Grodno and Volhynia guberniyas illustrate this statement. The first forms are those from the Ellis Island Data Base. The second forms are those that I was able to read myself by looking directly to the pictures of the original documents available on the same website.17 The forms in brackets [ ] are those used in DJSRE.
Grodno guberniya: Agnoski – Agurski [Agurskij], Atowski – Stawski [Stavskij], Belsusky – Belansky [Belyanskij], Besches – Bosiches [Bosikhes], Bewanowski – Baranowski [Baranovskij], Bivinstein – Burnstein [Burnshtejn], Buleuski – Butenski [Butenskij], Cheten – Cholow [Kholov], Dazrowsky – Lazarowsky [Lazarovskij], Efwimski – Efroimski [Èfroimskij], Findzel – Finckel [Finkel’], Gerlin – Perlin [Perlin], Gumoritz- Gumnitz [Gumnits], Hominski – Slonimski [Slonimskij], Hransowicz – Abramowicz [Abramovich], Hameschin – Manuschkin |Manushkin], – Aszkenazy [Ashkenazi], Kalpson – Halpern [Gal’pern], Kenstein – Orenstein [Orenshtejn], Kinlang – Rimland [Rimlyand], Kirstof – Kustof [Kustov], Knochncz – Kuschner [Kushner], Koffenauer – Hoffmann [Gofman], Kowcikin – Kiweikin [Kivajkin], Kranek – Krawetz [Kravets], Kresner – Kremer [Kremer], Lekvoiski – Zakroiski [Zakrojskij], Rukvi – Rubin [Rubin], Sambaum – Barnbaum [Barnbaum], Schipin – Schifrin [Shifrin], Schkowitz – Selikowitz [Zelikovich], Stukow – Stukar [Shtukar], Swochitzky – Suschitzky [Sushitskij], Tomerkutz – Pomerantz [Pomerants], Tun – Torn [Torn], Turber – Farber [Farber], Walforn – Wolfow [Vol’fov], Wowezyla – Wowczyk [Vovchik], Wulles – Müller [Miller], Zabendive – Babendure [Babindur], Zakfus – Zalzfas [Zal’tsfas], Zumgas – Grüngas [Gringas].
Volhynia: Beinotern – Baustein [Baushtejn], Bertnek – Bertuch [Bertukh], Brunan – Briman [Briman], Buelman – Buchman [Bukhman], Chairus – Chanus [Khanus], Elulehguadi – Ehrlechgerecht [Èrlekhgerekht], Forben – Toiben [Tojben], Grilszka – Gruszka [Grushka], Grites – Wites [Vites], Hamdaw – Lamdan [Lamdan], Karoch – Karsch [Karsh], Kostjar – Kotljar [Kotlyar], Lus – Fux [Fuks], Ozvenes – Czernes [Chernes], Rutinskin – Rubinstein [Rubinshtejn], Seutzky – Slutzky [Slutskij], Wewin – Lewin [Levin], Wirgman – Wugman [Vugman].
Errors of a similar type also characterize the spelling of given names (Ileyer instead of Meyer; Pinchos misread as Tinctos; Selig misinterpreted as Lelip) and toponyms (Waterinoslaw instead of Ekaterinoslaw).
Misinterpretations of handwritten characters also can be found in Hebrew spelling. For example, a number of Roman alphabet forms found on the Yad Vashem website actually result from a misreading of some letters in the original Hebrew spelling. This situation is particularly common for Pages of Testimony compiled during the second half of the 1950s. During that period, almost all Pages dealing with the Holocaust victims from Eastern Europe are written in tiny Hebrew characters that are often difficult to decipher. In this situation, the Yad Vashem employees who recently prepared the searchable database did a titanic job. In the main, they succeeded in recognizing the actual spellings.
As the discussion demonstrates, to facilitate the solution of problems related to the possible misinterpretation of handwritten characters, for every element (surname, given name, place name) of online searchable databases, it is appropriate to identify whether or not the original source was handwritten. If it was, the alphabet used in the original source should be explicitly identified: Roman, Cyrillic or Hebrew. This information is of paramount importance when analyzing possible graphic misinterpretations.
Conclusion
To take into account the issues discussed in this paper, one can suggest the following multi-method approach that can be automatically applied by web software when a user looks for the potential equivalents of a surname X of his/her interest in online searchable databases.
- The use of the ameliorated soundex system (discussed above)
- The transliteration methods described above should be applied in order to transform surname X and those present in the list into the forms that conform to the same orthographic rules (that in turn necessarily correspond to the same language). To diminish the time of processing, the search for potential candidates can be limited to those surnames from the database whose soundex value is identical to that of surname X. After the application of transliterations, one can automatically establish that, for example, the following forms are just graphic variants of the same appellations:
– Хомский, Khomskiy, Khomskij, Chomski, Chomsky, Homschi, Jomsqui, חומסקי and כאָמסקי
– Ваксман, Vaksman, Vaxman, Wachsmann, Waxman, Waksman(n), Vacsman, וקסמן and וואקסמאן;
– Грин, Grin, Gryn, Grün, Grünn, Grien, Green, Grine, גרין
– Кац, Katz, Kats, Kac, Kaz, Catz, Cats, Caţ, כץ, קאץ
– Чак, Chak, Czak, Tschack, Tschak, Tzack, Csak, Tchac, Chaque, Ceac, טשאק, צ׳ק
- If surnames from the list in which the search is being done originate from a handwritten source, then the list of characters that look most similar in this handwriting (see those mentioned above) should be consulted by the software. After the application of this rule, one can automatically conjecture that the following pairs are equivalent: Furowetzky – Jurowetzky [Yurovetskij], Intkowsky – Jutkowsky [Yudkovskij, Yutkovskij], Japolsky – Topolsky [Topol’skij], Jurmansky – Furmansky [Furmanskij], Katrowitz – Katzowitz [Katsovich], Kerenstein – Berenstein [Berenshtejn], Piager – Prager [Prager], Tewelinsky – Jewelinsky [Evelinskij], Tukazger – Tukazyer [Tukatsier], Urnisky – Urinsky [Urinskij]. In these pairs, the first elements correspond to the spellings found in the Ellis Island Data Base. The second elements correspond to the actual spelling that can be seen on the pictures of the original handwritten documents. The forms in brackets correspond to spellings used in DJSRE.
- As can be seen from the examples of misinterpretation of names from Grodno and Volhynia guberniyas presented above, the list of possible graphic distortions can never be exhaustive. In the handwriting of different persons, characters that look similar can vary. Consequently, tables of correspondences, although good, are insufficient in numerous cases. Another method may fix this problem. Among surnames that constitute the online database, one should try to find those possessing the most important number of characters in common with the surname X. The sequence of these letters is of paramount importance; on the other hand, their exact position in surnames plays a secondary role. For example, Hransowicz results from a misreading of Abramowicz.18 Both graphic forms have two character strings in common: ra and owicz. Moreover, in both forms these strings are placed similarly: ra occurs after the initial characters (H or Ab); owicz appears at the end; additional characters (ns or m) occur in both. The fact that ra starts in Hransowicz in the second position, while in Abramowicz in the third one, is of no importance.
The exact logic of the analytical software can vary, but it is clear that, in principle, the application of this method can easily suggest the potential equivalence of such forms as: Dulik – Dudik [Dudik], Dunewik – Dunewitz [Dunevich], Estoin – Estrin [Èstrin], Flugin – Dlugin [Dlugin], Gwick – Zwick [Tsvik], Iribulski – Pribulski [Pribul’skij], Kinselman – Kimelman [Kimel’man], Koslar – Kotlar [Kotlyar], Kuenetzov – Kuznetzov [Kuznetsov], Musdekatin – Muschkatin [Mushkatin], Natausow – Natanson [Natanson], Ohwalowski – Chwalowski [Khvalovskij], Pokol – Sokol [Sokol], Rabita – Rabitz [Rabits], Tolak – Polak [Polyak], Wolpiznsky – Wolpiansky [Volpyanskij], Yabe – Gabe [Gabe].
- If two previous methods deal with possible misinterpretations of handwritten materials, an additional method should take into account the possibility of typing errors. This covers:
- typing of a character situated on the keyboard next to the correct one
- inverting the order of two characters.
The five methods suggested above are not alternatives: they should be used in conjunction with each other. The first two methods concern the search for phonetic equivalents. The three subsequent methods all deal with graphic equivalences.
Among the searchable databases available to Jewish genealogists is one that already combines methods that have many aspects in common with those discussed in this paper: The Central Database of Shoah Victims’ Names of Yad Vashem (Jerusalem). For the advanced search, it provides the following possibilities:
- Soundex; see method (1) described above (with possible ameliorations suggested)
- Synonyms. In principle, for surnames this should be equivalent to our transliteration approach, although the rules currently applied by Yad Vashem can be amended to take into account numerous additional interchangeable elements discussed in section Transliteration and corresponding to method (2)
- Fuzzy. This feature allows one “wildcard.” Equivalents are Sarah and Sarha, Lodz and Lozd. This feature represents a particular case of a more sophisticated procedure consisting in methods (4) and (5).
As a result, the information presented in this chapter may be used to enhance the searchable characteristics of this (to my mind, the most advanced) database as well as many others.
Notes
- In modern French, ai, au, eu, ei are monophthongs. This fact of French orthography is, however, irrelevant for our topic: the surnames from Alsace-Lorraine and those borne by migrants from Central and Eastern Europe are not of French origin and as a result in the original pronunciation of these surnames the letter combinations in question never correspond to monophthongs.
- This kind of treatment is similar to that of ch in the existing system. It also gives rise to two numbers: 4 (the sound /ch/ as in English child) or 5 (the sound /kh/ as in the English word loch or the given name Chaim).
- The only exceptions are surnames from Courland; they were spelled in German until 1890s.
- Recently, Yad Vashem also created a possibility of a search of its database in Russian. One can observe, however, that some Russian spellings actually are not directly extracted from the original Russian documents but represent transliterations from the English forms (which in turn were transliterated from Russian). For example, the surname Слуцкий (correct Russian form readable on the photocopies of the original documents) often appears as Слуцки, that is, after the direct transliteration from Slutzki, the Latin characters form of this name.
- Actually, in some cases, Yad Vashem database also uses such elements as ay, ey, oy, uy even when transliterating names from recent testimonies of former Soviet citizens. They are, however, significantly less frequently found than ai, ei, oi, ui, respectively.
- It is only during the last decades that the English transliteration became more frequently used than the German one.
- Its main rules are presented in Table 3.
- In a few cases when the spelling is English (such as Shneider and neither Schneider nor Sznajder), one invariably finds that the given names are also English or Anglicized. Consequently, these passengers were not coming to U.S. for the first time; most likely they were Jews who immigrated to North America during previous years and now were on their return trip from Eastern Europe (generally from their native towns).
- Among the rare exceptions is Janowicz (Polish spelling) from Kobrin coming to U.S. in 1913 via Hamburg.
- The term applied to the Polish territory that was part of the Russian Empire from 1815 to 1917. It consisted of ten guberniyas at the turn of the 19thB20th centuries with Warsaw (Warszawa) as its capital. This area is also called the Kingdom of Poland and (popularly) Russian Poland.
- At that time, it covered Hungary properly, Slovakia, Ruthenia (the Transcarpathian part of Ukraine) and Transylvania (now in Romania).
- A few typos are still present even in the testimony pages. This occurs when elderly persons who are confused between some letters fill out the document. One of these examples is Фуrфайn (correct Russian Фурфайн; DJSRE: Furfajn) in which r and n are taken from the Roman alphabet, while the other letters are Cyrillic.
- Often names on the testimony pages are originally written almost in a calligraphic way in order to avoid any ambiguity. My experience working with this database showed that problems of interpretation are generally limited only to the situation when a whole series of the testimony pages is filled in by one volunteer who interviewed elderly persons from his native town. For example, dozens of spellings provided by one individual from Vinnitsa (Ukraine) are almost illegible; unfortunately, he had a particularly bad handwriting.
- The typewriting of passenger lists starts only during the 1920s.
- English spelling is relevant only for passengers coming to North America from England. Some of them bear surnames that clearly originated in Eastern Europe. These families likely correspond to those making a second migration. The first one was that from the Russian Empire to England. For this reason, these individuals often have Anglicized given names.
- Misreading of letters is the most common, but not the only type of error found on the Ellis Island Passenger Lists searchable database. Sometimes, the second given name is taken to be the beginning of the surname as Mowscha Ber Kriger appears as Mowscha Berkriger, while Chaje Masche Furman is Chaje Maschefurman. Also, certain Gentiles are taken for Jews and vice versa. For example, Ossip Schotzik (misspelled as Schotzin), a Christian Pole (as it is written on the original document) appears as if he were Jewish in the database. Clerks on Ellis Island made certain errors. For example, the members of the Panfilow family from Volhynia are labeled “Hebrew” on the passenger list. Still their given names (Adam, Praskovia and Michail) undoubtedly identify them as non-Jews. A female passenger Goldschnait from Rovno had a brother Gitschneider [Gitshnajder]. Another migrant’s surname is spelled Majtlowski, but the relatives of this person are given another, correct, form Matlowski [Matlovskij].
- Sometimes, when the handwriting is really difficult to decipher, it is useful to consult other columns of the list: (1) nearest relative or friend in country whence alien came; (2) whether going to join a relative or friend; and if so, what relative or friend. In them, one often finds the same surname spelled another way and some letters obscure in the spelling of the migrant name can be clarified.
- This and the following examples all come from my experience of working with the Ellis Island Data Base.
Alexander Beider is a linguist and the author of a number of books and papers dealing with the etymology of Ashkenazic surnames and given names, as well as the history of Yiddish. Born in Moscow, he currently lives with his family in Paris.