Note for the thirty or so followers still tagging along: this blog is now officially defunct and is only going to be kept around as an archive of my earlier work. I welcome you to continue following Freelance Reconstruction at its new home on Wordpress — or, alternately, my recently created specifically-for-tumblr linguistics blog @possessivesuffix.
Péter Hajdú has listed examples of the differences in vocabulary of the three major dialectal groups he calls Taz (north-western), Tym (central), and Ket (south).
Hajdú, Péter: The Samoyed peoples and languages 1962
I guess he means “differences in phonetics”? All of these word-triplets are obviously related. There are probably actual lexical differences too, though, even if perhaps not in vocabulary that is this basic. (All of these words go back to Proto-Samoyedic, at least half of them also further back to Proto-Uralic.)
The Southern word for ‘frog’ should probably be čamǯe, by the way, with an affricate and not a vowel of unknown quality. (Or a central vowel, if this were IPA and not UPA transcription.)
But yeah, everyone who is still following this blog should probably also go follow uralic-solidarity. They’ve been posting plenty of good basic linguistic info about the Uralic languages lately.
The closest language relative of Enets language is Nenets, more specifically its tundra dialect. The name Enets comes from Enets language: Enets people call themselves онаэ энечео (onae enecheo) or онай энчу (onaj enchu), ‘real men.’ Juha Janhunen has compiled some more Enets words, some of them collected twice first by Castrén in 1854 and 1855 and then by Tereščenko in 1966, 1971 and 1977. Changes that happened during 100 years can be seen in most examples, although Castrén and Tereščenko might have collected them from different dialects. The examples of dialectal differences listed by Péter Hajdú suggest that these samples are from the Baj dialect. (See previous post and the Enets word for crust on snow in both dialects.) Estimated Proto-Uralic form is marked by an asterisk.
A minor sidenote: your asterisked forms are Proto-Samoyedic roots, not Proto-Uralic.
Incidentally this little list also shows some of the many different (pre)historical connections of the Samoyedic languages, so I hope you don’t mind a little tangent.
Two of the words are indeed ancient inheritance: *korå comes from Proto-Uralic *kojəra which is also the root of e.g. Finnish koiras. *kot comes from Proto-Uralic *kosə, whose other descendants include e.g. Northern Sami gossat, meaning ‘to cough’.
*päjmå has a different background: this is an old loanword from the Turkic languages. In modern Turkish the corresponding word is başmak (though Wiktionary tells this is dialectal and not used in standard Turkish).
I do not know if a pre-Samoyedic origin of *nårå is known. There might not be; the Samoyedic languages have a lot of word roots of unknown origin, not found in the other Uralic languages or in their neighbors. It’s probable that Proto-Samoyedic, when it first spread across central Siberia around 3000 to 4000 years ago, assimilated several earlier languages spoken in the area.
Grizelda Kristina died in Canada after spending her last years helping to document a language that modern-day Latvians cannot understand
Another one down. I'm hoping it'll be while before the next bad news.
(Though second-language speakers still remain, with some interest in the language having recently resurfaced in its former homeland in northern Latvia. Livonian, or its Western dialects anyway, being a fairly well-recorded language, its spirit may linger on for a while yet.)
This list is quite old, but I’m pretty sure it is still useful for those who are interested in the Finnish-Hungarian related words (sorry, Estonian is not on the list - I’ll post a link to better list later when I find something useful). Please, read the introduction before you go to the list - it includes a lot of useful information.
A probably well-known introductory resource. It would be interesting to have an updated version of this at some point. (Who knows, I might put one together myself at some point.)
If anyone is interested in some analysis of the list, aside Laakso's references there is also a handy Wikipedia page Regular sound correspondences between Hungarian and other Uralic languages (which I've recently been expanding and cleaning up a bit).
As commented before, Freelance Reconstruction is switching platforms. I've set up a basic Wordpress account and it's already clear the service is better suited for technical science blogging than Tumblr.
For the time being I'll probably keep this account still around too though, for popular science, discussion & other such topics that seem to not fit all that well in between serious research-y posts. Haven't decided yet if I'll migrate my previous posts of the latter sort over or not…
A trend I have noticed about historical Ugristics is that Eastern Khanty, the Far Eastern dialect group in particular, tends to be assumed extremely conservative among the Khanty varieties.
Certainly Eastern has a ton of vowel distinctions that are not found in the west, retains the retroflex consonants /ɳ ɭ/, is the only dialect group to reflect a distinction between *l and *ɬ, and presumably more, but still, "key languages" (/varieties) do not exist & the situation here should not be taken as proto-Khanty by default.
One assertion I've seen here and there is to the effect that spirantization of *k is "almost universal" in Khanty. In the Eastern dialects /ɣ/ is indeed found frequently. Taking up again a "provided correct comparisions, a correct reconstruction must be deducible from the data alone" voyage though (this reflects an attitude drilled in me by my studies in mathematics and, to a lesser extent, physics and chemistry), it seems that this has two different sets of correspondences in the Western varieties (Northern Khanty + Southern Khanty).
East /ɣ/ can firstly correspond to /w/ in the West. Some examples:
– "to cut": Vakh (Far East) /øːɣət-/ ~ Demyanka (South) & Obdorsk (North) /eːwət-/ (← PU *äktä-, cf. South Sami aektedh "to cut apart neck vertebrae")
– "to row": Vakh /laːɣəŋt-/ ~ Dem. /tew-/ ~ Nizyam (North) /towət-/ (← PU *suxə-, cf. Finnish soutaa)
– "clay": Vakh /saɣɯː/ ~ Kazym (North) /sowi/ (← PU *śawə, cf. Finnish savi)
The other pattern is E /ɣ/ corresponding to W /x/.
– "liver": Vakh /muːɣəl/ ~ Dem. /muːxət/ ~ Obd. /maxəl/ (← PU *mëksa, cf. Finnish maksa)
– "cheek": Vakh /puːɣləm/ ~ Dem. /poxtəm/ ~ Kazym /poxəɬmaː/ (← PU *poskə, cf. Finnish poski)
– "pole": Vakh /saːɣəl/ ~ Dem. /saːxət/ ~ Obd. /saːxəl/ (← PU *śëlka, cf. Finnish salko)
Now, word-initially, Western /x/ results from spirantization of *k before back vowels. Presumably this was via *k → [q] → [χ], paralleling similar developments in Selkup as well as tons of Turkic languages. (While I've never seen allophonic uvulars in Khanty, or Mansi for that matter, explicitly recognized, they're right there in all the original field records I've seen quoted.) This suggests the same origin for word-internal /x/ as well. And the type 2 correspondences of Eastern /ɣ/ indeed occur solely in back-vocalic words! At least in inherited Uralic vocabulary that I've scanned thru. If unetymologized exceptions were to occur, I'd take that as evidence for them being of later than proto-Khanty origin actually…
It also seems to be the case that Eastern medial /k/ occurs (again, in Uralic vocabulary) almost solely in front-vocalic words. There is Vakh /nʲalkɯː/ "fir" (← PU *ńulka, cf. Mari нулго /nulɣo/), but this might be due to being preceded by a consonant, not a vowel.
Additionally, while /ɣ/ ~ /w/ can also originate from PU *x, *w and occasionally even *ŋ, the cognates in the other Uralic languages almost always speak for /ɣ/ ~ /x/ originating from PU *k. One apparent exception is PU *joŋsə "bow" — e.g. Dem. /joːxət/ — but a development *ŋs → *ŋks → *ks seems to have occurred widely across Uralic in this word.
My conclusion is that the latter correspondence thus represents Proto-Khanty *k rather than *ɣ. The /ɣ/ reflex in Eastern Khanty appears to parallel the lenition of *p in Far Eastern dialects (e.g. "hair": Vakh /aːwət/ versus Obd. /oːpət/ (← PU *ëptə, cf. South Sami voepte)).
I will still need to re-check what standard resources actually say — it's possible I'm reinventing the wheel here. :) I do however recall seeing a claim that /ɣ/ ~ /x/ would actually go back to *ɣ rather than *k, and /ɣ/ ~ /w/ to to *w. (I don't currently have an opinion on if *w or *ɣ would be more warranted for the latter.) OTOH, at least Viitso's studies in early Uralic consonant isoglosses assert *-w- → *-ɣ- as universal in Khanty. I do not know what his opinion on the development of medial *k is.
I tentatively also note there seem to be similar but in some ways different things going on word-finally.
*k closing a front-vocalic monosyllable appears to lenite as if it were *[q]. E.g. "burbot": Vakh /seɣ/, Dem. /sex/ (← PU *śäkä or *śekä, cf. Finnish säkä, säkiä "catfish")
*k closing a back-vocalic monosyllable, paradoxically, appears not to lenite. E.g. "hill": Tremyugan (Central East) /tʲak/ (← PU *ćukka, cf. Northern Sami čohkka "summit") — or is there a trace of the PU *kk : *k contrast here??
*k closing a 2nd+ syllable doesn't seem to lenite either. E.g. "raven": Vakh /kɔːɭək/, Dem. /xuːləx/, Obd. /xoːləx/ (← East Uralic *kulɜkɜ, cf. Hungarian holló)
*p closing a monosyllable?: Vakh /tew-/, Dem. /tep-/, Obd. /tap-/ "to err" (← Ugric *tep-, cf. Core Mansi /tip-/, Hungarian téved/). This is a verb though so soundlaws for medial position might still apply.
I even notice some evidence suggesting an inverse development from final *w or *ɣ to /p/ in a 2nd+ syllable in Western Khanty (= Southern + Northern)?
– Dem. /tuːp/, Obd. /luːp/ "oar" (← PU *suxə- to row") seems particularly clear, unless there is some nominal suffix here? (Edit: yes there is, as indicated by Mansi *tup "oar".)
– "Light, sparse": Vakh /loːrəw/, Dem. /tuːrəp/ (← PU *šarwa? cf. Finnish harva)
– "Tent": Vakh /oːləw/, Dem. & Niz. /uːtəp/; the word has original *m (← PU #owðəm? cf. Finnish uudin : uutime- "curtain") so *p can probably be ruled out. This could also be related to the lenition *m → *b/*w̃? → *w found regularly in Hungarian in derivational suffixes, sporadically elsewhere as well (e.g. *ńälə-mä → nyelv "tung", *nimə → név "name").
–" Membrane": Vakh /kalʲɯː/, Krasnoyarsk (North) /xalʲəx/, Obd. /xalʲəp/. Has Kr. substituted a different suffix here or could this be taken as evidence for a split development along the lines of *p ↔ *w ↔ *ɣ → x? Hungarian hályog ("cataract") does not help, suggesting either original *ŋ (which interestingly would still be compatible with Finnish kalvo), variety in suffixation, or loan origin. The unetymological Khanty *lʲ, Hungarian ly lends some support to the last option of these.
Posts have been scarce recently, although I would have much to talk about here. I am finding Tumblr's new post editor incredibly slow, especially with long posts. This might be avoidable even without updating from my old laptop (I don't even know how many tabs I have open right now)… but regardless, it seems to be time to check out if other blogging services would fare better. Regular delays of up to 2-3 seconds for typed text to show up — sometimes even for just a single letter — cannot be tolarated.
Before I go on, time for a small general overview I’ve been meaning to write for a while, to keep my readership on board.
Word structure in the languages of the world varies greatly. In most recorded or reconstructed languages of the world, not all permissible word shapes are however equally common: certain phonotactic shapes could be considered “typical”, and others “marginal”. Proto-Uralic is no exception to this.
According to long-standing general consensus (the reasons for which would not be difficult to demonstrate, but this would take too long here), most typically PU roots were made of two syllables. These were not equivalent however: the second, unstressed syllable was almost always a simple short, open syllable with a single initial consonant (though word-final consonants could occur in inflection), while the initial, stressed syllable could well have a final consonant, or lack an initial one. So a generalized syllable structure cannot really be stated: “(C)V(C)” would be overly general, “CV” would be too limited. We can easily join these for a “root structure” formula though: (C)V(C)CV.
The asymmetry between stressed and unstressed syllables goes deeper still though. All reconstructions agree that at least six (but probably at least eight) vowels were distinguished in the initial syllable — and that no more than two options were consistently available in the next one. It is likely (but not universally accepted) that in part this was due to vowel harmony: the corresponding back and front open vowels *a and *ä were not contrasted, but both words such as *pala “bit” and *pälä “half” could still occur, ie. with *a found following back vowels, and *ä following front vowels. [1]
Aside from the open vowel option, another quality also occurred. What specific sound value(s) this had is a disputed issue. Maximal phonetic differentiation would call for a close vowel such as *i (possibly with a vowel-harmonic counterpart *ï), and this is the transcription that may have the widest use in the literature these days: e.g. *weti “water”. However, the actual evidence does not favor this: the non-open vowel is widely across Uralic rather reflected as a reduced vowel, something like *ə. Even Finnic /i/ only occurs word-finally, while in other positions /e/ is found (which adds up to alternation in e-stem nouns). [2]
The current transcription I use on this blog follows the scheme a/ä/ə. For an original open stem vowel, I write *a after back vowels and *ä after front vowels. If I need to speak of these classes as a whole I use the cover symbol *A. For an original non-open stem vowel, I write *ə. Therefore, a reconstruction such as *wetə is completely equivalent with standard *weti.
You may occasionally also see the symbol ɜ. This is traditional Uralistic transcription for a vowel whose quality cannot be determined. Certain Uralic languages have lost or merged all unstressed vowels, and today their basic inherited roots have a monosyllabic shape, most commonly (C)V(C)(C). For PU roots that have not survived in one of the three branches that fairly consistently preserve unstressed vowels (Finnic, Samic, and Samoyedic), the contrast between *A and *ə may not be recoverable. I find this less intrusiv than the practice common elsewhere in historical linguistics of simply writing “V” for “vowel”: making up an example, compare *tirɜ vs. *tirV?
(Another reconstruction notation worth mentioning at this point: while *asterisks normally mark reconstructed items, I’ve picked up from somewhere the idea to use #hashes for approximate pseudo-reconstructions that don’t actually rely on known regular correspondences. This is useful with e.g. “messy” roots that show very divergent shapes all over the family, or when referring to data from families I am not particularly familiar on.)
In addition to *A-stems and *ə-stems, a third type has also been considered fairly basic. Some Finnic roots show what are called “primary” long vowels, corresponding to single vowels in most other Uralic languages. [3] A good example may be the root for “language”: Finnish kieli (kiele-), Northern Sami giella, Mokša /käĺ/, Udmurt /kɨl/, Komi /kɨv/, Khanty *kööɬ, Nganasan /kieja/, to pick a few reflexes. Curiously these only occurred with the stem vowel *-ə. This led to a third basic root type with assumed original long vowels, *(C)VVCə (or *(C)VVCi) hanging around in reconstructions for a while.
A full coverage on the story of these would be too much a digression (and I’ve covered some of it before), but suffice to say that these days it seems that long vowels, or anything equivalent, is actually not necessary for Proto-Uralic after all. The “language” root above may be reconstructed as *kälə, for example. I point those interested in the details towards a recent article by A. Aikio. [3]
Many further observations on PU root structure would be possible, but this should suffice to keep readers on board with some future posts.
[1] The International Phonetic Alphabet defines [a] as an open front or central vowel, but in Uralistics, *a and *ä refer to the vowels known as [ɑ] and [æ] in the IPA, after the orthography of Finnish and Estonian.
[2] A recent article exporing this idea is Petri Kallio (2012): The non-initial-syllable vowel reductions from Proto-Uralic to Proto-Finnic (SUST 264). Some of the topics are better explained by Aikio in the same volume (see note 4 below), but I find the basic thesis on *ə being a better reconstruction than *i convincing.
[3] This is in contrast to long vowels that have arisen from loss of former consonants and thus correspond to VC(V) structures in (some) other Uralic languages.
[4] Ante Aikio (2012): On Finnic long vowels, Samoyed vowel sequences, and Proto-Uralic *x (SUST 264)
The phonology of the Mordvinic languages is, from a historical point of view, deceivingly simple. While they cannot really be called "archaic" in the same sense Finnish has been, the group is rather short on any characteristic phonological features. Sure, there have been a couple general innovations — but most of these are shared widely with the other Uralic languages. The spirants *ð, *ðʲ, *x and the velar nasal *ŋ (though the last one is reported from dialectal Erzya around the beginning of the 1900s; no idea if it's still kicking) were quite volatile consonants and them having merged with other consonants like *t or *j is not a big surprize. The same applies to the "umlaut" vowels *ü and *ë. The lenition of word-medial voiceless consonants followed by shortening of geminates (to reintroduce intervocalic [p, t, k]) is found from all the neighbors: Mari, Permic, the now relocated Hungarian, and even the Finnic Veps. The shift *w → *v has at some point hit approximately every language west of the Atlantic, east of the Urals and north of the Caucasus. Reduction and/or loss of final vowels is one of the most common sound changes in any language anywhere.
The principal common Mordvinic innovation that can actually be seen in the phoneme inventories may be the rampant palatalization. The Proto-Uralic contrast between *n : *ń and *s : *ś (the corresponding contrast among the spirants did not survive) has been extended with /ť, ď, ŕ, ĺ/: even then this is actually only minimally phonemic, in the vicinity of back vowels, thanks to a couple changes like *rj → *ŕ. Palatalized allophones of all other consonants exist as well… but still only as allophones. This is quite unlike language groups like Romance, Slavic or Indo-Aryan, where palatalized consonants like *kʲ have typically only hung around for a short while before deciding to decay into a multitude of sibilants and affricates. Perhaps because Mordvinic already has a multitude, being one of the two Uralic branches to retain the three-way distinction between PU *s, *ś and *š.
The shift away from initial stress also probably counts. This is most strikingly reflected in words where initial consonant clusters have arisen — words like Erzya /kštada/ "bare" or Mokša /fkä/ "1" would be a challenge for speakers of most Uralic languages.
But beyond these, Mordvinic has been content to spend its "sound change quota" on "minor" sound changes that simply shuffle around the phonemes. No large-scale sibilant mergers as in Mari. No vowel length has arisen as in Finnic or Ugric. No denasalization of clusters as in Permic, Hungarian and West Samic, no consonant gradation as in Finnic and Samic, no cluster metatheses as in Permic and Ugric, no spirantization of initial stops as in Hungarian, most of Samoyedic and northern Ob-Ugric. One could say that Mordvinic sounds essentially like Proto-Uralic spoken with a heavy Russian accent. (Then again, given that Uralic was there first, maybe saying that Russian sounds essentially like Proto-Slavic spoken with a heavy Mordvinic accent would be more accurate.)
…OK, I was going somewhere else with this, but it's late and I'm getting sidetracked. Let's try again tomorrow.
There is one issue in Uralic subgrouping that has had me particularly wary in the wake of the news that separating Finno-Ugric from Samoyedic isn't actually valid, and that is Mansi.
Most of the nine basic subgroups of Uralic are obviously proper clades, as can be quickly confirmed by e.g. the existence of at least some soundlaws within them that are entirely unique in the whole family, and a number of further ones applying to the entire group. This is not the sole possible method for identifying a proper subgroup as such, but it might be the most convenient one.
(Here is a quick set of individual markers for all the other main groups, sticking with mostly consonants as their original distribution is easier to reconstruct. Samic: *x → *k; Finnic: *ti → *ci; Mordvinic: *#ü → *wi; Mari: *-ð- → ∅; Permic: *lm → *nm; Hungarian: *s → ∅; Khanty: *s-ś → *ɬ-s → *s-s; Samoyedic: *l → *j.)
But I have previously not been able to locate a single good perimeter of this kind for the Mansi language, or, more accurately, group of languages. Mansi has four basic dialect groups, laconically titled North, East, West & South (the last two are extinct, AFAIK). These are all quite different and a maximalist approach could recognize up to four separate "Mansic" languages. Among these there is a twofold basic split: N+E+W form a group I've taken up calling Core Mansi, which stands in opposition to South Mansi (also known as Tavdin). These two being valid groups I am sufficiently convinced of.
The literature out there does present as a couple changes as general Mansi innovations. One is *č → *š. I however consider this as being of low evidential value, as this is a change that has a wide distribution: it also reached Hungarian, as well as several Khanty dialects. I also wonder if this should be considered a development simultaneous with *ć → *ś which applies to Core Mansi only, and is again found across Khanty dialects as well. A similar critique applies to *ɬ → *t, shared with the entire Samoyedic group as well as central Khanty dialects, so this too was a change that applied across several already separated language varieties, possibly in several waves.
Where things get downright suspicious is that in several vowel correspondences Tavdin appears to remain closer to the original state of affairs, while Core Mansi and Khanty have shared a development:
PU *o → Tv *aa; CMs *oo; Kh *oo
PU *a → Tv *oo; CMs *uu; Kh *uu
PU *ü → Tv *ü; CMs *(ʷ)ä; Kh *ö
Scholars have various views on these correspondences. Honti (1998) reconstructs in the 1st class of words Proto-Ob-Ugric *oo, which would develop into *aa in Proto-Mansi, then back to *oo in Core Mansi; a rather awkward solution. For the 2nd he goes with *uu, which is better but still suffers from having to assume two pre-Ob-Ugric rounds of vowel raising + unconditional long vowel lowering in Tavdin. Sammallahti (1988) reconstructs *oa, *uo, *ʏ, all of which seem like compromise intermediate values. I would reconstruct simply *aa, *oo, *ü which were retained in Tv. but shifted further in the other two groups.
(Why *o ends up as *aa, but *a as *oo is a puzzle that'll have to wait for another day.)
It can also be noted that even the ethnonym "Mansi" must be a retention rather than an innovation, as it is etymologically identical with "Magyar", the Hungarian self-appellation, and has cognates in Khanty as well.
So, what I've done is browse thru a large bundle of Uralic etymological material that can be found in Ob-Ugric, while looking for common Mansi innovations — or, alternately, innovations that could suggest other grouping, such as Core Mansi + Khanty.
The results do seem to confirm the validity of Mansi. But now I at least have some solid soundlaw(ish) evidence to assert this. And under the cut, so will have you!
Only a single soundlaw clearly applicable to all of Mansi, and only Mansi, emerges. This the loss of *p before another obstruent. Sammallahti mentions this tangentially. For example, PU *sopśə "net needle" yields Mansi *taas; PU *aptɜ- "to bark" (‹ɜ› being a Uralistics notation for a vowel of unknown quality) yields Mansi *oot-; and PU *ipsə "smell" yields Mansi *ät.
Under my previous standards, I'd be happy with just this. Highly specific conditional developments like this are not a type of sound change that would commonly spread widely across dialect continuums. However, looking closer, there is plenty more evidence to be found in just the phonetic data, without a need to start digging thru grammatical details.
A soundlaw that also seems basically applicable is a reduction *nč → *š word-finally, ie. in nominals. An exception though is *künčə "nail" → Tavdin künš. Likewise *šäänš "knee" which seems to have avoided this law entirely. It may be a later loan though, this is one of the words with the correspondence Mansi *ää ~ Khanty *ää which generally display several irregularities and a lot of the time do not seem to derive from Proto-Uralic.
Non-productiv derivational suffixes with no apparent meaning ("root extensions") are also valid evidence. I noted four: *ool+ć "chin", *såw+ĺ "clay", *pëëś+ka "mitten", and possibly *taĺ+ək "tip". (The last one could alternately be a case of irregular reflexation, the root being PU *tuðʲka.)
Several irregular-ish phonetical developments are similarly found in Mansi only, and seem unlikely to be retentions.
Notoriously, *ś has two distinct reflexes: plain *s as in the rest of East Uralic, but also *š in a number of words. Jury's still out on what the reason may be.
An irregular initial *j- appears in the reflexes of at least three Uralic roots: *ëla- "under", *äktɜ- "to cut", and *enä "big".
The reflexes of *käwðə "rope" and *kälä- "to wade" have gained labialization of the initial stop.
Initial *w has been lost in the word for "5", *ät. Cf. Khanty *weet, Komi вит /vit/, Estonian viis, etc. Loss in Hungarian öt is best explained by the regular loss of *w before labial vowels.
An assimilation *m → *n occurs in *oonl- "to sit (perf.)", *oont- "to sit (imperf.)" (root *amɜ-, *amsɜ-?)
Frequently something lenited like *j, *w, *ɣ, ∅ is found in correspondence to *ŋ in Khanty, including at least the reflexes of *oŋtɜ "sting", *suŋə "summer", *peŋərä "wheel, round", *soŋə- "to come in".
Expected *ɣ is lost in at least *tälwä "winter", *sëksa "cedar", *ńulka "fir".
There is irregular medial palatalization of expected *ɣ to *j in at least *mëksa "liver", *tow(k)ɜ "spring". (Some of the examples with *ŋ may belong under this and the previous.)
Further examples for a couple of these developments occur in vocabulary only found in Core Mansi, which seems to rule out them being due to mistaken etymologization. The tasklist of figuring out what exactly is going on with any of these will have to be left for the future though. If you ask me, the emergence of *kʷ in Mansi seems particularly deserving of a detailed study… after all, the Uralic languages are normally known for their palatalized consonants, not labialized ones!
Time to trot in some supporting evidence. An apparently somewhat obscure paper by K. Bergsland investigates the reflexes of the palatal nasal *ń in Finnic [1], and argues among other things for a position rather similar to what I've suggested to have occurred with palatal sibilants: a medial unpacking development *ń → *jn.
It seems that this and *ś , *ć → *js, *jc could be consolidated into a single change affecting all palatalized consonant inherited by earliest Proto-Finnic. Proto-Uralic had a fourth palatalized consonant as well, the "spirant" usually reconstructed as *ðʲ. This was however depalatalized earlier, at the West Uralic stage: e.g. kadota "to disappear", Samic *kuoðē-, Mordv. *kadə- "to leave" ← PU *kaðʲa-. Hence we have no reason to expect this to participate in a development particular to Finnic.
One apparent difference is that Bergland suggests this change to have been general, unrelated to vowel quality. Yet his data includes only one item with a preceding labial back vowel: oinas "ram", compared with Samic *vuońës "tame". This comparision is questionable to begin with, as cognates from Samoyedic suggest an original PU root *ëńɜ, not *ońa as would be required to include Finnic. A Baltic loan etymology is also possible for this word, given Lithuanian ãvinas. (There are some details in this that I think call for comment, but they will be, once again, best left for another post.)
Other examples brought up by Bergland are not numerous, but fit in my framework without problems. Two items count as explicit supporting evidence. The first is painaa = PF *paina- "to press", which can be compared with at least Komi /poń-/, Mordvinic *pańə- "to push", and probably Samic *puońō- "to dip (in smth)". (Mansi *poń- "to press" is probably better derived from *puńə- "to twist", on account of the short vowel.) Secondly B. compares Karelian leinä (no meaning given) with "among others" Samic *lāńē "young birch", Khanty /ɭiiń/ "slack". Other cognates proposed for these seem to include Mansi /liń/ and Komi /lɤń/ "weak" — and even Estonian lein "mourning" which seems semantically too far off. Although Komi and Samic suggest *läńä while Karelian and Ob-Ugric suggest *leńä, this seems to still suffice as (weak) evidence for unpacking — particularly as the correspondence F *ei ~ K /ɤ/ was previously also seen in "knife".
Before /i/ a case with no unpacking (as I've previously predicted) exists: *mińä → miniä "daugher-in-law". B. suggests that here palatalization broke off to the right, ie. *ń → *-nj- → *-ni-. This is basically ad hoc and an analysis with no unpacking and a suffix *-iä seems equally possible.
Two examples with *ü show something else going on. This is a close front and labial vowel: my previous analysis would suggest that no unpacking will occur here either. Indeed none does, but there is vowel lengthening: *küńəl → kyynel "tear", *küńärä → kyynärä "cubit". In the absence of counterexamples this could be considered regular. Nothing immediately prohibits Bergland's interpretation that the change was *üń → *üjn → *üün, but as there is no evidence otherwise for the 2nd change (and the oblique plural stem of *kälü "sister-in-law", *kälüi-, is fairly good counterevidence), direct lengthening is a better assumption.
Anyway, back to analyzing if there's evidence for a PU cluster *ŋś, or for it being reflected as Finnic *-js- or -jc-. Getting highly technical now… (Part 1; part 2.)
A particularly damning case against the sound change *ŋ → *j can be found in the word for "swan": joutsen, again supposedly from something like *joŋ(k)śən(ə) according to traditional references on Finnish etymology. I get the impression the development is supposed to proceed thru an epenthesis *ŋś → *ŋkś which would block palatal assimilation, but there is no reason why other cases of *ŋś would not have gone thru this, nor is vocalization *ŋk → **u a thing, so the entire thing sounds like handwaving. This also has a problem similar to "7": external cognates don't really show evidence for a nasal inside the word. Samic *ńukčë, Mordvinic *lokśəŋ, Mari *jükćə, Permic *juś(k) are coherent with basically *-kś-, even if there's something weird up with the initial consonant.
Since a reconstruction *-kś- does not predict or even in any way explain *-ucc- in Finnic, perhaps *-ŋś- should after all be reconstructed here though: under my current model a vocalization *ŋ → *u would be quite acceptable, and *ŋs → *ks in Samic in the reflexes of "bow" (see part 1 in this series) indeed suggests *ŋś → *kč as the expected development for a cluster like this. Still I am not sure at all if this would be preferrable to a reconstruction connecting the Samic word eastwards instead, and anyway, all the irregularities, or the absense of East Uralic cognates, don't particularly support a Proto-Uralic origin for this word.
Next, a word of limited distribution: this is suitsuta 'to smoke, to billow', which (besides clear cognates across Finnic, e.g. Estonian suits 'smoke') has been compared with Inari Sami sohčeđ, "Swedish Sami" tjåktjet. These add up to a common Proto-Sami form *Sokčë-, though with disagreement on what the initial sibilant was. Again *ŋś → *kč could be assumed, but this is unwarranted since the only direct parallel (in the previous word) has Finnic *-u-, not *-i-.
I'll suggest a different solution: the apparent Finnic *suiccu- might actually be via consonant gradation from a trisyllabic original *suɣiccu- (but I'd need to confirm what the Veps / Votic / Livonian evidence has to say about this; at least online dictionaries of Veps and Livonian seem to not know this word). In this case the development here could be *Sukəćə- → *sukəjcə-w- → *sukicu- → *suɣiccu- (with a similar gemination of *c as in "seven", "swan" and "knife", and -i- from palatal unpacking). The Samic forms are regularly derivable from this as well, by a simple loss of the 2nd syllable vowel.
The fifth word is the only one to be not found in Finnic. In the appendices to Historical phonology of the Uralic Languages (1988), P. Sammallahti does reconstruct a cluster *ŋś, but not in any of the usual suspects: for "to stand" he has simply *śanśa- (the initial *ś- seems to be only on account of Samic, which I would consider an innovation), and for "swan", *ńokśi, with no comment on how the diphthongs in Finnic would be derived. The cluster however appears in *läŋśä "gadfly", a reconstruction based on evidence from only Samic, Permic and Hungarian. And checking, it does not seem to hold up: according to latest results, the implied Samic **lāwčē yielding North Sami lávčá etc. (cf. the word's Álgu entry) is better derived as a loan from Scandinavian (Old Norse kleggi, I would assume from older *klagjV-, whence also South Samic *klāččē / Northwest Samic *lāwčē). Particularly damning are South Sami klahtje, Ume Sami /klačč/ which retain the un-Uralic initial consonant cluster. While the Permic and Hungarian words can remain cognate (U. /ludź/, K. /lɤdź/, H. légy) they are derivable from simply *nś.
(It's also funny to observe that this and all the previous words seem to show basically no consensus on what *ŋś is supposed to develop into in Samic.)
Lastly, a database stroll turns up a Finnish word previously unfamiliar to me, niiska 'fish milt'. Outside of Finnic this only has suggested cognates in Ob-Ugric. Perhaps these have provided the initial motivation for assuming a palatalization of *ŋ: Khanty *ńeŋsəŋ, Tavdin /ńäńćī/, Core Mansi *ńinśəŋ. If these forms all are cognate, they indeed testify for a shift *ŋś → *nś having occurred in Mansi. Still, even if *ŋś were reconstructed here on account of Khanty, it would remain unconvincing to use the same reconstruction in 'knife', 'to stand' which have Khanty *-nć-.
Even here it also does not seem necessary to assume that the Finnic long *ii derives via a palatality assimilation **ńiŋś-ka → **ńijśka → *niiska. This time as well, a better reconstruction for this root may be a trisyllabic structure, *ńiŋəśə, with *-iŋə- regularly contracted to *ii in Finnic (as in e.g. *šiŋərə → *hiiri 'mouse'). I grant that in Ob-Ugric, apocope between the 2nd and the 3rd syllable would not be normally expected (e.g. *puśərə- → Kh *poosər- 'to press', *ńälə-mä → Kh *ńääləm, Ms *ńeeləm 'tung') — but the further suffix *-(ə)ŋ seems like the explanation: a structure **ńeŋəsəŋ with /ə/ in an open syllable would, AFAIK, not have been valid in the Proto-(Ob-)Ugric period. Hence, *ńiŋəśə-ŋ → *ńeŋśəŋ.
I'll address some details of vocalism as well, now that I've attempted rehabilitating this comparision. The normal outcome of PU *i is Khanty *ee, Mansi *ä, while the correspondence Kh *e ~ Ms *ä or *i usually goes back to PU *e. This has probably been a prime reason this root is discarded by e.g. Sammallahti.
It seems to me there is a possibility Khanty having short *e could be attributed to the following consonant cluster. Though no direct parallels exist, the corresponding split among back vowels is well attested: PU *u yields Khanty *o before consonant clusters, *oo before single consonants. *śilmä → *seem 'eye' is not a counterexample as *lm was simplified to *m already in the common Ugric period, albeit *ipsə → *eepəɬ 'smell' seems to be; yet with only two examples, it is hard to say which should be considered the regular outcome. (Also worth noting is that the third close vowel of PU, *ü, is consistently reflected as short *ö in Khanty, regardless of syllable structure.)
The difference between /ä/ in Tavdin and /i/ in Core Mansi is simpler to explain: *ä was probably shifted to CMs /i/ after the palatal initial /ń/. A parallel is found in Tv /ńär/ ~ CMs /ńir/ 'branch, twig'.
There actually are a couple further words I could treat here as well, but they show simply too much irregularity across the cognate set for any arguments to be based on them. But for the sake of completeness: these are the the word for 'lizard', which is apparently /šiŋšale/ in a couple Mari dialects (also Finnish sisilisko, etc.); and a word referring to a "certain" small bird, appearing as *ćäŋćii in Mansi (and restricted to East Uralic anyway in distribution).
Current conclusions: There are no clear cases where *ŋś would have to be posited for Proto-Uralic, or otherwise in pre-Finnic. Hence, there also is no reason to posit a palatal assimilation *ŋ → i in such a context. Words that have been considered evidence for this change seem explainable via palatal unpacking: *ś, *ć → *js, *jc, when following the illabial vowels *e, *a.
My stance on Indo-Uralic is complicated. I do think the relationship appears promising in a couple ways, but at the same time, plenty of work is needed before it can be more than that. It is not an immediately obvious grouping (most of the material used for supporting a relationship has parallels elsewhere across the "Nostratic" families of Eurasia as well) — and yet at the same time, attributing the most basal similarities to solely loan contacts does not make much sense to me either. This layer comprises almost solely of such basic vocabulary as pronoun roots and simple verbs, with more typical loanword material absent.
J. Koivulehto, veteran tracker of IE loan etymologies in Uralic, has suggested that this would be due to the overall good retention rates of such words, while nominals could get lost or replaced. Sounds good on first glance — but when the layer of Proto-Indo-Iranian loanwords looks nothing alike, with specialized and cultural terminology well-preserved, it becomes required to project the PU~PIE layer into distant past, not almost contemporary with the PU~PII layer as current analyses suggest (these invoke an archaic "Northwest Indo-European" that would have remained until PII times quite similar to PIE).
(Then again, I also think the PII layer is younger in the Uralic relativ chronology than currently thought. I just mentioned how "honey" fails to correctly work in Hungarian, and I have similar suspicions for many other words.)
(I even have a couple ideas in my sleeves that would allow internal Uralic etymologies for a few of the alleged PIE loans, but that'll have to be a topic for later as well.)
Proto-Indo-Uralic does not exist, ie. has not been reconstructed. Of course, PIE and PU are phonologically quite different, and hence even a cursory look at their inventories will suggest several sound correspondences. PIE complex stops : PU plain stops; PIE laryngeals : PU *k, *x, ∅; PU complex vowels : PIE *e, *o; PU complex sibilants : PIE *s. (Some macrocomparativists have even naïvely suggested comparing PIE palatovelars with various PU palatal consonants, but these words seem generally best analyzed as Indo-Iranian loanwords.) This is of course convenient, yet it also sets up a high risk of running into accidental similarities, as long as the variety cannot be reduced by means of internal reconstruction or the evidence of pre-protolanguage level loanwords. Differences such as abundant word-initial clusters in PIE yet none in PU also would have to be addressed.
Now, internal reconstruction of PIE is a thriving area of research — it is within Uralic where very little has been explored. Potential does exist though, as can be seen from dublets such as *sula- "to melt" ~ *suŋə "summer, to thaw", or *jalka "foot" ~ *jälkə "footprint".
I think in general we would have some ground to begin talking of an actual reconstruction (be it of an common ancestor or of the pre-proto-languages during an older stage of loan contacts) once we can identify some non-trivial correspondences, or use evidence from one branch to shed light on issues in another. These are rather lacking at the moment. Comparisions around here do not even address the sound values of PU "dental spirants", the number and identity of PIE laryngeals, or other standing issues in the reconstruction of each family.
So, in case "proponent" means "considers it a topic worth working on", count me in! A relationship between two adjacent language families is not a particularly extraordinary claim and it does not require particularly extraordinary evidence. Just don't expect me to eschew the regular level of skepticism that etymological comparisions need to be subjected to; or for that matter, to ignore any possible competing proposals (Uralo-Yukaghir, Uralo-Eskimo, various flavors of Ural-Altaic, and so on forth also need a careful analysis).
…Now what the helvettiä is going on here? Why is a single Samoyedic group used while the Samic varieties are at least implicitly recognized? Why are the Ob-Ugric, as well as the Mordvinic languages of all things, also subsumed under "Samoyedic"? And where have Mari, Udmurt, Veps etc. disappeared?
The Mongolic and Tungusic situation is also interesting, as these get no subdivision whatsoever — a glaring contrast to all the detail paid on enumerating the Turkic idioms.
I wonder where maps like this keep coming from, we've e.g. had a fairly accurate map of the Uralic languages on Wikipedia for like four years now for easy comparision, and the World Atlas of Language Structures also lists most of the family quite well. Why tack on neighboring families on a map like this without any attempt at an accurate or even a consistent level of detail?
Also here's an example of a proto-cluster that is a different kind of questionable. Consider this table showcasing the descent of *šk across the Uralic family:
Although fairly quickly assembled (this is based on data from the StarLing database of the Tower of Babel project, which — although handy for checking data — comes with a large amount of outdated comparisions and by rule unreliable reconstructions), this collection should suffice to highlight a problem: while there are plenty of examples, they are quite disproportionally distributed. Pretty much nothing turns up in the entirety of East Uralic, and one of the cases that does (the 1st 'brittle' word), points to *č(k) anyway. (On a closer look, perhaps two roots should rather be reconstructed here: Permic-Ugric *ručɜ 'rotten' and Finnic-Mari-Komi *roška 'brittle'; these even contrast as Komi /rɨž/ vs. /rɤš/. It is not a wonder that it has been proposed that *š only came first about in the Finno-Permic period.
Yet we need not, and should not stop there. Also Samic, part of the West Uralic group together with Finnic and Mordvinic, reflects no more than three two roots, despite that in general the branch is known to have been almost as successful as Finnic at retaining original Uralic vocabulary. There are problems with regularity here as well. In the reflexes of the cluster itself (issues in vowels etc. also exist but let's not go into those right now) this may be due to conditional developments though… In Mari, in /toš/ 'back of knife' the cluster has probably been simplified due to word-final position; /luš/ existing besides /luškəðo/ 'weak' also suggests as much. Similarly /pükš/ 'hazel tree' seems compareable to Mordv. *päšks of the same meaning, with a suffix *-s triggering another cluster simplification. The 'round' root here, *keškərä or *kečkərä, then seems like a derivativ of *keččä 'ring', which would explain at least the correspondence F *h ~ Mo *č, although at present I have no explanation for why that would yield Mari -š-.
Regardless: it seems to me the uneven distribution should allow concluding that there was no original Proto-Uralic *šk, and what we are dealing here is a layer reflecting some adstrate(s) (as well as some accidental similarities) that have provided words with *šk to Finnic, Mordvinic, Mari & Permic. These are the "southwestern" languages of the family, closest to the ever-revolving palette of linguistic influences from the direction of central Europe, the latter three also from the steppes. Also a number of specific agricultural terms (*wešnä 'wheat' in F+M+M, *čošə 'barley' in M+M+P, *skal 'cow' in M+M+P, etc.) can be reconstructed between these languages — these are very unlikely to derive from PU, some of them sporting quite obviously un-Uralic shapes. Whatever language(s) these originate from seems like a good candidate for sowing *šk as well.
—The case of *kš is even more glaring:
This table's completely dominated by Finnic and Mordvinic (which admittedly correspond perfectly well). The abundance of three-consonant clusters *kšt, *kšk should also be a red warning light. Moreover, this time a general appeal to unknown substrates, adstrates or contact languages is not the only option, as *mekšə 'bee' is a well-known Indo-Iranian loan, from PII *makši or its precedessor *mekši. Also, despite this having a decently wide distribution, including a perfectly natural excuse for being absent from Samic (lack of an apiculture tradition, if not bees altogether), I regardless think it would have arrived only after the initial separation of Uralic: the /h/ in Hungarian méh is without precedent and contradicts known facts about the development of similar clusters (š and *s are supposed to merge in East Uralic, and yet e.g. *mëksa "liver" → máj, *joŋsə 'bow' → *joksə → íj).
Only 'cold' has a good shot at being an original PU root. This is a doublet *ja/ëkša, *jäkšä (compare Estonian jahtu- vs. Finnish jäähty-, both meaning 'to cool'). The vowel variation is likely related to the existence of *jäŋə 'ice'. I also cannot rule out that Khanty *jööɣɬii might derive straight from something along the lines of *jäŋ-s/š- with no direct relation to the Western forms, which would put us back to the "no good evidence for this being an original root" stage…
Got sidetracked into doing some more general analysis of clusters involving palatal consonants. An observation that raised a couple eyebrows was two words that seem to have funny reflexes of *kś: these are *ńukśə "sable", and *sükśə "autumn".
For starters, Estonian seems to have split these clusters: nugis, sügis. A similar but not quite equivalent effect can be seen in Finnish nois, syys (implying older *noɣis, *süɣüs; though the latter also exists as syksy). If palatal consonants are supposed to coalesce with plain ones in Finnic, this cannot really be attributed to the original palatalization, since *ks holds up perfectly well: *suksə "ski" → Es. suks, *mëksa "liver" → Es. maks.
I could propose that there was a development *kś → *kjs → *kis or something along these lines, but this seems to go deeper than that. In Udmurt, these roots show up as /niź/ and /siźɨĺ/, both with a voiced sibilant, and /i/ in place of expected /ɨ/. Out of line in exactly parallel ways — doesn't seem this could be wholly accidental. (Compare *ńulka → /ńɨl/ "fir", or *oksa → /us/ "branch".) At least the first effect hints at the PU roots rather having had a structure something like *CVkəśə, with usual medial voicing + later loss of *g in Permic. There is a slim chance this could be related to the irregular vocalism too, though *lukə(-nta) → /lɨd/ "number" suggests that probably not.
A third issue with these Udmurt reflexes is both of them having an initial non-palatal consonant. In the case of "autumn", Mansi *tüks and Hungarian ősz apparently speak for this being original; Samic *čëkčë, Mordvinic *śokś and Khanty *söɣəs should probably be explained as long-range assimilation to the 2nd sibilant. With "sable", this gets harder: even the closely related Komi has /ńiź/, and similarly Hungarian nyuszt ("marten"), Mansi *ńoks, Khanty *ńoɣəs. There is apparently Forest Nenets /noxo/ with no palatalization, yet of low evidential value since this means "arctic fox" and lacks any other Samoyedic cognates. Regardless, there is no particular obstacle for assuming an assimilation *n-ś → *ń-ś for Komi + Ugric. The only potential counterexamples I can find is a word for "to knead", something like *nänś-, only found in Mari and Udmurt; and a word for "very", something like *nanś-, only found in Komi and Hungarian. Both seem like prime candidates for being of later than PU origin.
Back to the original issue, the Mansi reflexes having *-ks is not 100% expected either, for that matter. *siɣs "gull", compareable to Finno-Samic *śäkśə "osprey" seems to show the usual East Uralic development *-kC- → *-ɣ(ə)C-. Yet this is irregular in the "wrong direction"; as in Permic, we'd expect *-kəś- to be more conductiv for lenition of *k, not less! It's possible this is not a PU bird name after all, but two independant onomatopoetic coinages, and that *-kś- → *ks is regular for Mansi. There are other cases as well where Khanty has lenited *-k- but Mansi has not, at least Kh *eeɣət- ~ Ms *jäkt- "to chop". These all seem to go back to an original cluster though — it's not clear if a trisyllabic root would work.
Assuming -kəś- has other problems too. In Mordvinic too a structure like this should undergo medial voicing, to yield something like **śügəźə→ **śej(ə)ź.
The situation appears to be one where Permic and Finnic point to one thing, while the rest of the family points to another. A last-resort idea that comes to mind is inter-family loaning… F → P loans are a known thing. What if there indeed was a Finnic development something along the lines *kś → *kjs, then this got into Permic? This would seem to even explain the vowel in "autumn", as a substitute for the *ü that still persisted as a front vowel in Finnic. This is of zero help for the "sable" case though where the vowel has gone from *u → *ɨ → /i/, and it's also very unclear how would palatalization still carry over here? How vexing.