Nominal suffixes as markers of information structure in Basketo

This paper deals with the information function of two nominal suffixes, -i appearing in all nouns, and -nin firstand second-person pronouns in Basketo, a North Omotic language predominantly spoken in the Basketo Special Woreda in Ethiopia. The suffix -i is often described as nominative. However, object nouns without definite marker can be marked by -i, and as a result -i can appear in both subject and object in the same sentence. We analyze morpheme -i as a marker of specificity. Suffix -ndistinguishes short and long forms of the firstand second-person subject pronoun. The short form is the same as the possessive. In general, possessive does not bear any pragmatic information in discourse. Likewise, short pronouns also show no pragmatic function, but show what is subject or agent in a clause. On the other hand, long pronouns are morphologically and pragmatically marked. We analyze morpheme -nas the foregrounded topic in discourse in contrast with zero anaphora.


Introduction
This paper 1 deals with the information function of two morphemes, -i appearing on all nouns, and -n-in first-and second-person subject pronouns in Basketo 2 , a North Omotic language predominantly spoken in the Basketo Special Woreda in Ethiopia. According to the Federal Democratic Republic of Ethiopia Population Census Commission, the number of native speakers of Basketo is estimated at 78,284 (2007 census). Basketo is one of the least studied languages of Ethiopia. There are some recent studies of the language, which deal mostly with morphology. Amha (1993Amha ( , 1995 deals with noun morphology including personal pronouns. Schütz (2006) analyzes nominative and accusative marking of common and personal pronouns in the framework of Distributed Morphology. Sottile (2002: 90-105), which is a descriptive grammar of Basketo, deals with personal pronouns. Treis (2014) analyzes the grammatical means of encoding interrogativity in Basketo, based on a corpus of recorded spontaneous speech events. However, none of these deal with the information function of -i or long and short forms of first-and second-person pronouns discussed here.

Methodological preliminaries
The analysis of Basketo adopts the framework of information structure presented in Lambrecht (1994). According to the proposed theory, the most important categories of information structure are: 1) presupposition and assertion, 2) identifiability and activation, 3) topic and focus.
Presupposition and assertion have to do with the structuring of propositions into portions which a speaker assumes an addressee already knows or does not yet know. According to Lambrecht (1994: 51ff.), pragmatic presupposition is the set of propositions lexicogrammatically evoked in a sentence which the speaker assumes the hearer already knows or is ready to take for granted at the time the sentence is uttered. On the other hand, the pragmatic assertion is the proposition 1 Data for this paper have been collected during my fieldwork in Arba Minch and Basketo, with a native speaker of Basketo. My special thanks go to Mr. Fiqre Dejene, my foremost informant, whose efforts to help my studies were far beyond the ordinary. My research is supported by a Grant-in-Aid for Scientific Research (no. 18KK0009) from the Ministry of Education, Science and Culture in Japan.
2 ISO 639-3 code: bst. Basketo has 29 consonants and 10 vowels as follows: p, t, ʦ, ʧ, k, ʔ, b, d, ʣ, ɡ, p', ʦ', ʧ', k', ɓ, ɗ, ɸ, s, ʃ, h, z, ʒ, ɦ, m, n, l, r, w, j, i, e, a, o, u, ii, ee, aa, oo, uu. Acute accent represents high tone. expressed by a sentence which the hearer is expected to know or take for granted as a result of hearing the sentence uttered. For example, in using a restrictive relative clause as in I finally met the woman who moved in downstairs 3 , the proposition expressed by the relative clause becomes part of the pragmatic presupposition, namely "old information". The main clause bears a (pragmatic) assertion, namely "new information" 4 .
The second category concerns referents. Identifiablity and activation have to do with a speaker's assumptions about the statuses of the mental representations of discourse referents in the addressee's mind at the time of an utterance. According to Lambrecht (1994: 77ff.), identifiable referent is one for which a shared representation already exists in the speaker's and the hearer's mind at the time of utterance, while an unidentifiable referent is one for which a representation exists only in the speaker's mind. Identifiablity has to do with the grammatical categories of definiteness and specificity. Definiteness is a formal feature associated with nominal expressions which signals whether or not the referent of a phrase is assumed by the speaker to be identifiable to the hearer. Specificity has to do with the referent of indefinite noun phrases. A specific indefinite NP is one whose referent is identifiable to the speaker but not to the hearer, while a non-specific indefinite NP is one whose referent neither the speaker nor the hearer can identify at the time of utterance. We will use this framework to discuss the nominal suffix -i in subsection 5.1.
On the other hand, activation has to do with consciousness. According to Lambrecht (1994: 93ff.), the psychological factors determining the activation states 5 of discourse referents are thus consciousness and the difference between 3 According to Lambrecht (1994: 55-56), the pragmatic presuppositions lexicogrammatically evoked with the utterance in this sentence can be loosely stated as the following set of propositions: 1) the addressee can identify the female individual designated by the definite noun phrase (by a grammatical morpheme, the definite article the), 2) someone moved in downstairs from the speaker (by a grammatical construction, the relative clause who moved in downstairs), 3) one would have expected the speaker to have met that individual at some earlier point in time (by a lexical item, the adverb finally). 4 Lambrecht restricts the use of the terms "old information" and "new information" to aspects of information associated with proposition here. 5 Chafe (1987: 25ff.) defines three different activation states. An active concept ("given information") is one that is currently lit up, a concept in a person's focus of consciousness. A semi-active concept ("accessible information") is one that is in a person's peripheral consciousness, a concept of which a person has a background awareness, but which is not being directly focused on. An inactive concept ("new information") is one that is currently in a person's long-term memory, neither focally nor peripherally active. Lambrecht (1994: 94) refers to what Chafe calls "concept" as "(mental representation of) referent". short-term memory and long-term memory. An item is active if it is "currently lit up" in our consciousness, and activation normally ceases as soon as some other item is lit up instead. The active state of a referent is formally expressed typically via pronominal coding of the corresponding linguistic expression. The pronominal coding applies to free and bound pronouns, inflectional affixes, and null instantiation (zero coding) of an argument.
The final category concerns relations. Topic and focus have to do with a speaker's assessment of the relative predictability vs. unpredictability of the relations between propositions and their elements in a given discourse situation. Topic is the predictable element in an utterance. Therefore, topic is included in the pragmatic presupposition without being identical to it. On the other hand, focus is that portion of a proposition which cannot be taken for granted at the time of speech. It is the unpredictable or pragmatically non-recoverable element in an utterance. The focus of a sentence is generally seen as an element of information which is added to the pragmatic presupposition. Therefore, focus is part of an assertion without coinciding with it.
Topic referents have a degree of pragmatic accessibility. For postulating a general correlation between the activation and identifiability states of topic referents and the pragmatic acceptability of sentences, we can adopt Givón's scale for the coding of topic accessibility (Fig. 1). The phonological scale runs from zero anaphora to stressed/independent pronouns, and the word-order scale from R(ight) dislocated DEF-NP's to L(eft) dislocated DEF-NP's. As for continuity or accessibility, the left-most element codes more continuous topics, while the right-most less continuous ones 6 . Both Y-movement 7 (contrastive topicalization) and cleft-focus can be considered instances of more discontinuous/surprising topic constructions where the topic is placed to the left of the comment. We will mainly use the phonological scale to discuss nominal suffix -n-in subsection 5.2. Zero anaphora is most obvious and picks up the most continuous and accessible topic for the speaker and hearer. We will regard it as a backgrounded topic, in contrast with independent long personal pronouns in Basketo as a foregrounded 6 According to Givón (1983: 19ff.), the topic-comment orders show higher average numeral values for referential distance than comment-topic orders in some data of several languages. The term Y-movement (or: Yiddish movement) was used to describe the fronting of a noun phrase, which was felt to be reminiscent of Yiddish-influenced American English. Givón defines Y-movement as an object-topicalizing construction, where the more topical patient/ object is fronted and the less topical agent/subject is postponed (yielding OSV in SVO languages). The question will be further illustrated in examples (30c) and (31c). See also Pekarek, De Stefani & Horlacher (2015: 51).   (Givón 1983: 17) In addition, we will define the linguistic behaviour of the possessive forms of personal pronouns here. Herslund & Baron (2001: 2-3) defines "possession as the linguistic expression of the relation between two entities, a Possessor and a Possessum, such that one, the Possessor, is seen as being in some way related to the other, the Possessum, as having it near or controlling it". Possession can be classified into two main types of linguistic constructions: attributive (e.g. my credit card) and predicative (e.g. I have a credit card). The former attribute possession is highly general in meaning while the latter predicate possession is more specific 8 . For example, compare I come home with they always prepared for my coming home, the former's subject I bears the topic automatically in a sentence 9 , and the latter's possessive my modifies the verbal noun only and does not bear 8 According to Heine (1997: 28-29), a phrase like my house may derive from a large number of underlying sentences such as I own the house, I live in the house, I rented the house, I built the house, etc. 9 The notion of topic or theme as the first element in the sentence is extensively discussed in Prague School research (Functional Sentence Perspective). See Firbas (1979) about the idea of the communicative dynamism (CD). However, from a viewpoint of typological studies, sentence-initial element can be not only topics but foci from the strategy of left--movement (pre-verbal ordering) of V-initial languages. In any case, sentence-initial element has some pragmatic functions in a sentence. any pragmatic (namely, topic or focus) information in a sentence unless the possessive has strong (marked) accent in English. Therefore, the possessive forms of personal pronouns discussed here show only grammatical or semantic function at phrase level, and bear no pragmatic information in discourse. We will use this idea for explaining the function of short pronouns of Basketo later.

Grammatical outline of Basketo
This section gives a grammatical outline of Basketo focusing on aspects which are relevant for the topic under discussion.

Terminal vowels
Terminal vowels are the stem-final vowels found in citation forms (called "absolutive forms") of nominals in North Omotic languages generally. In Basketo the terminal vowels are /a/ for masculine nouns 10 and /o/ for the rare feminine nouns 11 . According to Hayward (2001), Basketo has both unstable terminal vowel (UTV) type and stable terminal vowel (STV) type. The former is found in core cases; nominative, accusative and genitive, and the latter in oblique cases; dative, ablative, instrumental etc., and with definiteness. Table 1 shows UTV-type and STV-type of Basketo.

Case marking
Basketo has both case marking and verb agreement. Both case marking and agreement can express the relation between a main verb and its dependent noun phrases within a clause. In languages in general, morphological marking of grammatical relations may appear in either the head or the dependent member of the constituent, in both or in neither. Nichols (1986) calls these four types of marking head marking, dependent marking, double marking, and neutral marking.
For a verb and its dependent nouns there are the following four possibilities (in all distinguished types, heads are indicated by superscript H, affixal markers by M): Dependent-marked: Head-marked: Double-marked: Neutral-marked: Basketo has a nominative-accusative system like other Omotic languages. But morphological marking is partly determined by definiteness, as in North Ometo languages. Indefinite nouns are morphologically marked only in the nominative with suffix -i. Basketo has a marked nominative system 13 in this case. On the other hand, definite nouns are morphologically marked for both nominative -di and accusative -dani. However, contrary to other Ometo languages, object nouns without a definite marker can be marked with the suffix -i which seems to be a marker of specificity 14 . This shows a neutral case-marking system. Therefore, Basketo has a split marked-nominative system. Tables 2 and 3 show nominative / accusative case marking of definiteness and specificity in Basketo. We will discuss the specific function of -i in the subsections below 15 .
As an illustration, examples (1-2) are for non-specific (i.e. generic) nominative and accusative, example (3) for marked nominative, example (4) for both marked with definite marker, and example (5) for both nominative and accusative marked by -i as a specific marker.

Verb agreement
Verb conjugation in Basketo shows both subject agreement and aspect. Subject agreement indicates person, gender, and number. Aspect distinguishes imperfective and perfective. Verb conjugation shows "polyfunctionality", expressing person, gender, and number by one portmanteau morpheme, and is highly syncretic. Thus, in the imperfective the suffix -áre is used for 1SG, 2SG and 3SG.F and -íre for 3SG.M and all plural. There are two Perfectives; the recent past with -áde / -íde and past with -íne 16 . In the former the suffix -áde can be used for 1SG, 2SG and 3SG.F and -íde for 3SG.M and all plural. With the latter, the suffix -íne can be used for all personal endings of perfective, making the agreement a poor guide to the verb's subject. Therefore, subject nouns, especially independent personal pronouns, will be overt. See example (6) for -áde / -íne and (7) for -íde / -íne. In Table 4 the syncretic paradigm of Basketo is compared to the fully differentiated paradigm of Wolaytta.

Personal pronouns
First-and second-person subject pronouns 17 in Basketo have short and long forms. Similar pronoun paradigms are found in Ometo languages, including Wolaytta, Gamo, Gofa, and Dawro, as well as in Bench (see Amha 2012: 471). The short form is the same as the possessive form, which is morphologically the simplest form. The long form shows the morpheme -i parallel to nouns. The object can be marked by -na for all personal pronouns, and proper and kinship nouns. This morpheme may be the survival of the old accusative marker (see Hayward & Tsuge 1998: 22-26). Tables 5-7 show paradigm of proper nouns, kinship nouns, and personal pronouns (boldface for the old accusative marker), respectively.  .

Adjective predicates
Although Basketo is said to have a nominative-accusative system, in adjective predicates we find an interesting marking by specificity and definiteness. Adjective predicates typically express features or properties of their subject nouns. Therefore, both inanimate and animate nouns, if indefinite, use the citation (absolutive) form in -a. On the other hand, definite nouns are marked by nominative -i. Specific human nouns are also marked by nominative. This shows that morpheme -i has to do with specificity and definiteness. See example (8a) and (9a) for non-definite and (8b) and (9b) for definite in inanimate and animal nouns. In example (10) human nouns can be marked by either absolutive -a or nominative -i, but in example (11) divine nouns are always marked by nominative -i, irrespective of definite marker -d. Likewise, a generic subject in intransitive takes absolutive -a, too. See example (12) for generic meaning, and example (13) for specific in intransitive.

Statistical analysis of corpus
Here we use statistical data from a corpus of spoken Basketo, a conversation between four children in Balt'a village 18 . This corpus of recorded spontaneous speech consists of 82 clauses 19 . In language description, when we collect tran-18 Balt'a village is located about 30 minutes by car from the town of Basketo. The informants of this corpus are children from 8 to 10 years old. Children in the village use Basketo in everyday conversation, but since the lingua franca is Amharic, they also use Amharic to communicate with other communities, especially at school. They know one or two neighbouring languages imperfectly, too. 19 The breakdown of that is as follows: 58 main clauses, 15 converbs, 3 adverbial clauses, sitive sentences, we usually include both subject and object. On the other hand, recorded natural conversation foregrounds another aspect of language: the information structure.

Subject marking
We found three types of anaphoric subject form: zero, short, and long forms. See example (14) for zero anaphora, (15) for short form, and (16) for long form.  Table 8 shows the appearance of singular first-and second-person pronouns (there are no plural forms). Singular first-and second-person subject referents account for 84% of the total verbs. It shows that it often happens that either the speaker marked by first-person or the hearer marked by second-person is topicalized in the natural sentences. For first-and second-person subjects, statistical data of null, short, and long forms, are shown in Table 9. Null accounts for about two-thirds of the total. On the other hand, long forms account for less than 10% of the total. Therefore, we realize that the long form may be rather marked in discourse. With third-person subjects, zero anaphora accounts for less than one-fourth of the total, while specific nouns increase more than 50% in Table 10. Therefore, we realize that specific subject nouns may be unmarked in discourse. Specific nouns show earlier topics resumed and marked for the addressee's identifiability and activation.

Object marking
The types of object marking are shown in Table 11. In general, indefinite nouns function as new information and become the focus of the sentence. On the other hand, zero anaphora picks up an activated referent and becomes the backgrounded topic in discourse. The transitive clauses are 48 of total 82 clauses in the corpus 20 . Implicit objects (zero anaphora) account for 60% of the total and refer to the most accessible (activated) referent, typically the current topic. On the other hand, specific objects are very rare. Definite objects account for more than one-fourth of the total and show an overt topic with possessive or demonstrative.

NP deletion
Many Omotic languages morphologically distinguish two types of converbs for switch reference: the same-subject converb and the different-subject converb. The former indicates that the converb's subject is the same as the subject of the main verb. The latter indicates that the converb's subject is different from the subject of the main verb 21 . A converb is defined as a nonfinite verb form whose main function is to mark adverbial subordination. In a nonfinite clause the sub-ject cross-linguistically tends to be unexpressed and thus depends for its referential interpretation on the overtly expressed subjects of main clauses. However, we found all four patterns of NP deletion (Table 12) in the data. These data show that NP deletion in Basketo is not a matter of syntax but of pragmatics. We must explain the appearance of subject for examples (21) -(24) from information structure.

-i as a specific marker
Within the Afroasiatic phylum, marked nominative is found in Berber, Cushitic, and Omotic languages. Within Omotic, North Omotic languages show a concentration of marked-nominative languages, especially the Ometo languages are mostly marked nominative (König 2006: 695-698). Tosco (1994: 236) argues that the nominative -i of Basketo has been grammaticalized out of a topic marker. He analyzed the suffix -i of Basketo as functioning more like a topic than a nominative marker, unlike that of Wolaytta and Gamo. In the previous section we showed that the suffix -i of Basketo can appear on both subject and object in the same sentence. Therefore, we can analyze this morpheme neither as nominative marker nor topic marker. Here we propose analyzing this suffix as a marker of specificity. There is no doubt that -i has to do with specificity: the evidence from adjective predicate structure in subsection 3.5 shows this. Inui (2012) has some examples (25-28) of unmarked object without -a. As with 'óós (work)' of (20), the object does not refer to a specific individual, but implies simply generic or abstract meaning. In such a case, the suffix -i is infelicitous.
(29) táání zináábo ɦá úúɸ-í múj-ára maɗ-íne 1SG.NOM yesterday this injera-ACC eat-CNV.SS become sick-PF 'I ate this bread yesterday and became sick. ' We can regard the conditions for appearance of the specific marker in question as a hierarchy of individuation or a hierarchy of salience. Salience is not treated as a primitive in itself, but rather as the result of the interaction of a number of factors, such as animacy, specificity, singularity, and concreteness (see Comrie 1989: 199).
In summary, the suffix -i suggests that the reference of noun phrase in question is important, relevant for the discourse as a whole. The nouns, either subject or object, are activated by adding -i morpheme in discourse. Subject is more frequently marked by this morpheme, because subject is more salient than object in discourse. Here, we propose tentatively that this morpheme functions as specific.

-n-as a topic marker
Here we discuss the information function of the morpheme -n-of personal pronouns. It will be useful to utilize Lambrecht's definition of topic and focus (1994) and Givón's scale for the coding of topic accessibility explained in section 2.
The short and long forms of the first-and second-person subject pronouns may be used alternatively in the same context, apparently without any semantic difference in several Ometo languages. So far, no analysis of actual use of the short and long pronouns has been made. Rapold (2006: 341-363) discussed the various forms of pronoun in Bench, which uses two parameters, long or short, and strong or weak tone 22 , giving four combinations. Rapold reports that the long strong pronouns are the most discontinuous subject pronouns: they are typically used to code new or resumed topics, while short strong pronouns signal a higher topic continuity and also code subject focus. On the other hand, short weak pronouns are the most continuous (overt) subject pronouns, and long weak pronouns signal higher topic continuity than long strong or short strong pronouns, but are slightly more discontinuous than the short weak pronouns. The correlation between topic continuity and the various forms of pronouns in Bench is shown in Fig. 2 Pronouns with the shape CV are termed "short", and those with the shape CVC "long". Pronouns with tone 3 (a neutral mid tone) are termed "weak", those with tone 1 (a salient, extreme low tone), which are pragmatically more marked, "strong".
In this language, focus seems to be determined by stress accent, on the other hand morpheme -n-of the Bench long form seems to be characterized as a topic marker. The information structure of four combinations is shown in Table 13.
Thinking about what types of nominal are likely to be used as focus and topic, zero marking is used when the referent intended is the most accessible one, generally an activated referent, typically, current topic of conversation. Use of a pronoun guarantees that the referent intended is either activated (especially if unstressed) or at least accessible (if stressed). Use of a definite NP guarantees that the referent intended is identifiable, and generally both inactive and accessible. Use of an indefinite NP generally tells the hearer that the referent is not identifiable in the current context and hence is a new referent being introduced into the context. Thus, zero coding is used for a topic, while realization as an indefinite NP is used for a focal element. Typically, subject has to do with topic, while object has to do with focus. According to Givón (1979: 51-52), in an English text count, 50% of the direct objects were indefinite and 82% of the indefinite NPs were direct objects. We come up with a scale of markedness relations between the form of a referring expression and its function as topic or focus, as shown in Fig. 3  Turning to the data of Basketo, zero anaphora, whether of subject or object, functions as a topic, but is backgrounded in discourse. It is important that the short form is the same as the possessive in Basketo. In general, possessive does not bear any pragmatic information in discourse such as explained in section 2. Therefore, short pronouns also have no pragmatic function, but show what is subject or agent in a clause: this form shows only grammatical or semantic function in the clause. On the other hand, long pronouns are morphologically and pragmatically marked. Therefore, morpheme -n-makes a foregrounded discourse topic in contrast with zero anaphora, including such topics as normal topic, resumed topic, contrastive topic, unpredictable topic 23 , and even focus-like cleft NPs with high animacy. For example, if someone says It was John that ate the cake, the referent of the name John must already be known to the hearer, namely this is its identifiability status in the mind of the hearer. In this case someone ate the cake is the presupposition, someone = John is the assertion, the new information, and John is the focus of the utterance. Thus, we can consider the continuum from topic to focus as the remit of morpheme -n-.

The use of nominal suffixes -i and -n-in discourse
Here we will discuss the appearance of both nominal suffixes (-i and -n-) in the corpus. It may be difficult to find applicable examples from spontaneous speech, but we try to provide evidence for these functions. The following text is the first part of the corpus. We found three types of anaphoric subject form: zero, short, and long forms. Zero anaphora (3-A and 4-B) shows the most activated topic as a backgrounded topic, while long forms (2-B and 3-A) foreground to express the meaning of contrastive topic. On the other hand, short forms (1-A and 4-B) may show the neutral position relative to backgrounded or foregrounded topic. Turning to specificity, -i marks daraʤ-í as specific in 3-A, while zero anaphora is used in 4-B. Likewise, in the following text, the specific suffix -i with possessive is used in 8-B, the specific suffix -i in 9-C and zero anaphora in 10-B. In both cases, specific nouns 24 may be marked by the suffix -i, and if they are activated in discourse, they may become zero anaphora as a backgrounded topic.  (-ACC) do-IMPF 'Yes, I will.'

Further evidence from word order
We discussed the information function of two nominal suffixes, and analyzed -i as a specific and -n-as a topic marker. Finally, we can show some evidence from word order supporting these analyses. The following examples show the simple transitive (a), the corresponding passive voice (b) and the OSV word order (c). In general, the passive construction is a strategy foregrounding the patient, while backgrounding the agent. The patient is promoted from accusative to nominative while the agent is demoted from nominative to oblique case or often deleted. Basketo does not prefer the passive construction but the OSV word order 25 . Moreover, a subject with low animacy (such as 'bedbugs') tends to be avoided. Examples (30c) and (31c) show sentence initial accusative nouns with the suffix -n-as a topic marker, on the other hand, the nominative nouns have the suffix -i as a specific marker for salience.

Conclusion
We discussed the information function of two morphemes, nominal suffix -i, and -n-in first-and second-person pronouns in Basketo. First, though it has been said that Basketo has a nominative-accusative system, the suffix -i of Basketo can appear on both subject and object in the same sentence. So, we cannot regard this morpheme as a nominative or topic marker. Here we analyze morpheme -i as a specific marker. Second, there are short and long forms of the first-and second-person subject pronouns. The short form is the same as the possessive form. In general, possessive does not bear any pragmatic information in discourse. Likewise, short pronouns also have no pragmatic function, but show what is subject or agent in a clause. On the other hand, long pronouns are not only morphologically but also pragmatically marked. We analyze morpheme -nas the foregrounded topic in discourse in contrast with zero anaphora or a short pronoun as the backgrounded topic. For making sure of this idea, it is important to collect more data from natural discourse.