In this thesis, we developed methods for disambiguating the discourse usage and sense of connectives in a given free Turkish text. For this purpose, we firstly built a comprehensive Turkish Connective Lexicon (TCL) including all types of connectives in Turkish together with their syntactic and semantic features. This lexicon is built automatically by using the discourse relation annotations in several discourse annotated corpora developed for Turkish and follows the format of the German connective lexicon, DiMLex.
As in many other languages, Turkish has lexical connectives (referred to as single and phrasal connectives in this work), and it also includes suffixal discourse connectives. We developed a rule-based Turkish Connective Disambiguator (TCD) in order to solve the usage ambiguity of single, phrasal and suffixal connective types. Then, we designed machine learning models to disambiguate the discourse usage of connectives and to disambituage the sense of discourse connectives. We evaluated the TCD and the machine learning models by comparing their results with the human annotations in the Turkish section of the TED-Multilingual Discourse Bank and Turkish Discourse Bank 1.1. We observed that the machine learning approach outperforms the baseline rule-based approach although both approaches yield quite good results.
Within the scope of this thesis, we developed user-friendly interfaces for the TCL and TCD programs. The TCL program lists the discourse connectives in Turkish with their features and it presents several filtering and analysis capabilities. The TCD program, on the other hand, loads the selected free Turkish text to its interface and marks the discourse and non-discourse occurrences of connectives in the text. Additionally, if the selected file has a corresponding annotation file, the program automatically evaluates the disambiguation results.
This thesis makes important contributions to Turkish discourse parsing by solving the usage ambiguity of the single and phrasal connectives as well as the suffixal connectives, which, to the best of our knowledge, has been attempted for the first time in this thesis. This thesis is also the first to attempt to disambiguate the sense of all types of discourse connectives in Turkish. In this respect, it is predicted that the thesis would set baselines for future Turkish connective disambiguation works and pave the road for future researchers in the Turkish discourse parsing field.