Special issue of Dialogue and Discourse on: Beyond semantics: the challenges of annotating pragmatic and discourse phenomena
Stefanie Dipper, Ruhr University Bochum, Germany
Heike Zinsmeister, Konstanz University, Germany
Bonnie Webber, Edinburgh University, UK
Sep 7 2011: Open call
Nov 15 2011: Expression of interest, three-page abstract (details)
NEW: March 1 2012: Submission deadline, full papers (details)
TBA: Notification of acceptance
TBA: Final versions due
TBA: Publication (tentative date)
The topic of the special issue is Beyond semantics: the challenges of annotating pragmatic and discourse phenomena. The focus is on the problems and challenges that are specific to annotating phenomena that are "beyond semantics", i.e., pragmatic and discourse-related phenomena (e.g. anaphoric reference, information structure, discourse relations, discourse function, presupposition, subjectivity).
In this domain, it is often hard to transfer results from theoretical linguistics that are based on toy examples to naturally-occurring texts. Even provided explicit annotation guidelines, it is often difficult to annotate texts reliably. For instance, Ritz et al. (2008) and Cook and Bildhauer (2011) show that annotating information-structural features often result in inter-annotator agreement scores well below kappa=.6 (Cohen, 1960)---such scores are often assumed to allow for tentative conclusions only (Landis and Koch, 1977). Similarly, the overview by Artstein and Poesio (2008) shows that annotation of discourse-related features, such as dialogue act tagging or discourse segmentation, or word sense tagging also achieves low kappa scores in many studies. A possible approach to tackle these problems is the use of proxies (surface clues) (cf. Prasad et al., 2008).
The goal of this special issue is to enhance mutual awareness of people working on different kinds of "beyond" phenomena, from different perspectives. Ideally, the special issue will allow people to realize that there are problems they share---despite the fact that they are working on quite different tasks---and to recognize (partial) solutions that they too might be able to adopt.
We also see it as an important desideratum to promote the application of linguistic theories to naturally-occurring texts. This would enhance the search for operationalization of theoretical concepts, which probably then can be annotated with higher reliability. It would open up corpus-based development and validation of theoretical hypotheses. At the same time, operationalized theoretical concepts and reliable annotations would facilitate the use of pragmatic and discourse-related knowledge in computational linguistics.
This means, on the one hand, that we need more theoretical linguists annotating corpora and validating their theories based on corpora, and, on the other hand, more computational linguists drawing from linguistic insights to a greater extent when annotating training data.
We think it is time to bring together theoretical linguists who use texts and corpora for pragmatic or discourse-related research questions, and corpus linguists as well as computational linguists who create and annotate relevant corpus resources, or exploit them. The goal of the special issue is to enhance exchange and awareness between researchers of both fields, and to gain insights in the---possibly common---properties and peculiarities of the "beyond" phenomena. A prime example of the kind of corpus-based research that we have in mind is the classical paper by Prince (1981).
- Ron Artstein and Massimo Poesio. Inter-coder agreement for computational linguistics (survey article). Computational Linguistics, 34(4):555â€“596, 2008.
- Jacob Cohen. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1):37â€“46, 1960.
- Philippa Cook and Felix Bildhauer. Annotating information structure: The case of topic. In Stefanie Dipper and Heike Zinsmeister, editors, Beyond Semantics: Corpus-based Investigations of Pragmatic and Discourse Phenomena. Proceedings of the DGfS Workshop, GÃ¶ttingen, volume 3 of BLA (Bochumer Linguistische Arbeiten), 2011.
- J. Richard Landis and Gary G. Koch. The measurement of observer agreement for categorical data. Biometrics, 33(1), 1977.
- Ellen F. Prince. Toward a taxonomy of given-new information. In Peter Cole, editor, Radical Pragmatics, pages 223â€“255. Academic Press, New York, 1981.
- Rashmi Prasad, Nikhil Dinesh, Alan Lee, Eleni Miltsakaki, Livio Robaldo, Aravind Joshi, and Bonnie Webber. The Penn Discourse Treebank 2.0. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC), 2008.
- Julia Ritz, Stefanie Dipper, and Michael GÃ¶tze. Annotation of information structure: An evaluation across different types of texts. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC-08), pages 2137â€“2142, Marrakech, Morocco, 2008.
Topics of Interest
The overall guiding question of the special issue is: How do we annotate abstract pragmatic and discourse information? Such information is frequently not marked explicitly or unambiguously in natural language. It is usually dependent on context information, and annotators often have to reconstruct complex relations and situations from the context. Intuitions about pragmatic or discourse analysis tend to be less stable and more subjective than intuitions about syntactic or semantic phenomena.
Example questions that we would like to see addressed in the special issue are:
The idea is to gather research that reports on the generation (and exploitation) of corpora that are annotated with pragmatic or discourse-related information grounded in linguistic theory.
- In annotating texts, which methods are applied? For instance, to what extent are linguistic concepts replaced by surface proxies?
- To what extent does the format of annotation (different layers vs. one layer only) influence the annotation task?
- What kind of instructions are given to the annotators: Do they have to generalize from a set of given examples? Are they given a formal definition, whose applicability they are assumed to always test before choosing a particular label? Are there linguistic tests to guide the annotation?
Potential contributors are invited to send an expression of interest (EOI) to the guest editors by November 15, 2011. The EOIs should consist of a title and a three-page abstract. EOIs should be directed to the guest editors via beyondsem [AT] linguistics.rub.de.
Full manuscripts need to be formatted according to "Dialogue and Discourse" author guidelines, and submitted using the journal's online manuscript submission system (go to Login; choose Section "Annotating Pragmatic and Discourse Phenomena"). As a guideline, full articles should be around 30 pages, but if justified, significantly shorter or longer papers will be considered as well.
Please do not hesitate to contact us (beyondsem [AT] linguistics.rub.de) if you have any practical questions or are unsure about whether a possible topic you would like to write about would fall under this call.
The special issue aims at significant interaction between the two target audiences, theoretical linguists with an emphasis on detailed analysis of specific phenomena and computational linguists specializing in annotation. Therefore, each paper will have at least one referee from each "camp", to ensure both that the theoretical papers are technically strong and that the computational papers have sufficient empirical content.
The following people have already agreed to serve on the reviewing committee:
Maria Averintseva-Klisch (Tuebingen University, Germany)
Cathrine Fabricius-Hansen (Oslo University, Norway)
Klaus von Heusinger (Stuttgart University, Germany)
Ralf Klabunde (Ruhr-University Bochum, Germany)
Valia Kordoni (DFKI GmbH and Saarland University, Germany)
Rebecca Passonneau (Columbia University, USA)
Massimo Poesio (University of Essex, UK, and Trento, Italy)
Kiril Simov (Bulgarian Academy of Sciences, Sofia, Bulgaria)
Caroline Sporleder (Saarland University, Germany)
Angelika Storrer (TU Dortmund, Germany)
Michael Strube (HITS Heidelberg, Germany)
Dialogue and Discourse (D&D) is the first peer-reviewed open access journal dedicated exclusively to work that deals with language "beyond the single sentence". The journal adopts an interdisciplinary perspective, accepting work from Linguistics, Computer Science, Psychology, Sociology, Philosophy, and other associated fields with an interest in formally, technically, empirically or experimentally rigorous approaches. The journal is committed to ensuring the highest editorial standards and rigorous peer-review of all submissions, while granting open access to all interested readers.