TODO: comment on datasets

Klappholz and Lockman (1977;

History Edit

From Hirst (1981)[1]: "The high-school algebra problem answer ing sys t em STUDENT (Bobrow 1964), an early sys t em with natural language input , has only a few limited heuristics for resolving anaphor s and, more particularly, anaphor - like paraphrases and incomplete repetitions.


Winograd's (1971, 1972) celebrated SHRDLU system ... providing impressive and, for the most part , sophisticated handling of anaphors , including references to earlier parts of the conversation between the program and its user ."

Terminology: Winograd (1972[2], pp 30) use the term "back-reference" and "pronoun reference".

Applications Edit

TODO: Recasens and Hovy (2009)[3]: "Coreference resolution [...] has been shown to be beneficial in many NLP applications such as Information Extraction [1][4], Text Summarization [2][5], Question Answering [3][6], and Machine Translation."

Information extraction Edit

From Soon et al. (2001)[7]: "information extraction (IE) systems like those built in the DARPA Message Understanding Conferences (Chinchor 1998; Sundheim 1995) have revealed that coreference resolution is such a critical component of IE systems"

Analysis Edit

TODO: Hajishirzi et al. (2013)[8]: "The biggest challenge in coreference resolution — accounting for 42% of errors in the state-of-the-art Stanford system—is the inability to reason effectively about background semantic knowledge (Lee et al., 2013)[9]."

Error analysis Edit

Recall analysis: Martschat & Strube (2014)[10]

link-based error analysis (Uryupina, 2008[11]; Martschat, 2013[12])

transformation-based error analysis (Kummerfeld and Klein, 2013[13])

Subproblems Edit

calculate scores for nominals, pronouns and proper names separately: (Ng and Cardie, 2002[14]; Haghighi and Klein, 2009[15])

Broad-referring expressions Edit

"It" is a particularly difficult case for coreference resolution. It might refer to singular inanimate objects, some animals, abstractions/events, or non-specific things (pleonastic uses). However, see also Lee et al. (2009)[16] in which the authors claim to identify pleonastic "it" with accuracy "comparable to those obtained by human efforts".

"They" is slightly easier but more difficult than other pronouns.

"This" and "that" can also refer to abstraction which is rather broad (McShane and Babkin, 2015)[17]. However, in most cases I found in OntoNotes, they are followed by a noun such as "this area", "this facility", etc.

TODO: some special treatments: Kolhatkar and Hirst (2011)[18], Müller (2008)[19].

Kolhatkar and Zinsmeister (2013)[20]: "Anaphoric shell nouns (ASNs) such as this fact, this possibility, and this issue are common in all kinds of text. They are called shell nouns because they provide nominal conceptual shells for complex chunks of information representing abstract concepts such as fact, proposition, and event (Schmid, 2000)."

Pronouns Edit

From Wiseman et al. (2016)[21]: "Wiseman et al. (2015) show that on the CoNLL 2012 English development set, almost 59% of mention-ranking precision errors and almost 24% of recall errors involve pronominal mentions. Martschat and Strube (2015)[22] found a similar pattern in their comparison of mention-ranking, mention-pair, and latent-tree models."

Opaque mentions Edit

Recasens et al. (2013)[23]: "Coreference resolution systems rely heavily on string overlap (e.g., Google Inc. and Google), performing badly on mentions with very different words (opaque mentions) like Google and the search giant."

Types of coreferencing expressions Edit

TODO: referential hierarchies of Ariel (1988)[24] or Gundel et al. (1993)[25]

Distribution of the different anaphors in ACE (Table 1 in Denis and Baldridge (2008)[26])
Type/Count train test
3rd pron. 4,389 1,093
speech pron. 2,178 610
proper names 7,868 1,532
def. NPs 3,124 796
others 1,763 568
Total 19,322 4,599

Open problems Edit

From McShane and Babkin (2016)[27]: "Among the more difficult referring expressions are so-called broad referring expressions, such as pronominal this and that [...] In addition to untreated referring expressions, there are referring expressions that have been widely treated but have resisted high-precision results. One example is third person personal pronouns. The reason for the low precision is that resolution often requires specific world knowledge and reasoning, as illustrated by Winograd Schema examples like The mani could not lift his sonk because [hei was so weak / hek was so heavy] (Levesque et al., 2012)."

Linguistic theories Edit

Centering Edit

Focusing Edit

Discourse representation theory Edit

TODO: (Cormack, 1993[28]; Abraços and Lopes, 1994[29])

I can't get a PDF file of Cormack (1993). Reading Abraços and Lopes (1994), they seem to take a very different approach to coreference resolution. They propose and evaluate rules such as "recency rule" (look at the last constituent in the previous sentence), focus movement (what changes the focus), relative clause (the following sentence tends to bind to the main clause instead of relative clause), etc. To me these rules look so brittle and ad-hoc. Besides, there's no obvious mechanisms to resolve rule conflicts.

Approaches Edit

Classified based on inferencing method Edit

  • Rule-based
  • Inference-based: Inoue et al. (2012)[30]
  • Machine learning: see Pradheep (2005)[31]
    • Naïve Bayes
    • Decision tree
    • Conditional random fields (McCallum and Wellner 2005[32], etc.)
    • Integer Linear Programming (a review: Rizzolo and Roth (2016)[33])
    • Markov logic:
      • Poon & Domingos (2008)[34]: use MLN to encode (soft) rules such as: type/number/gender matching, apposition (e.g. Bill Gates, the chairman of Microsoft) and predicate nominals (e.g. he is Bill Gates).
      • Bögel & Frank (2013)[35]: TODO
    • Neural networks: Clark (2015)[36], Clark and Manning (2016a[37], 2016b[38]), Wiseman et al. (2016)[21]

Classified based on source of information Edit

Semantic knowledge Edit

Lee et al. (2013)[9]: "Haghighi and Klein found that this transductive learning was essential for semantic knowledge to be useful (Aria Haghighi, personal communication); other researchers have found that semantic knowledge derived from Web resources can be quite noisy (Uryupina et al. 2011a)."

Discourse-based Edit

Discourse-based method takes into account aspects of discourse such as coherence and centering.

From Laplinn and Leass (1994)[39]: "Discourse Based Methods Most of the work in this area seeks to formulate general principles of discourse structure and interpretation and to integrate methods of anaphora resolution into a computational model of discourse interpretation (and sometimes of generation as well). Sidner (1981, 1983), Grosz, Joshi, and Weinstein (1983, 1986), Grosz and Sidner (1986), Brennan, Friedman, and Pollard (1987), and Webber (1988) present different versions of this approach. Dynamic properties of discourse, especially coherence and focusing, are invoked as the primary basis for identifying antecedence candidates; selecting a candidate as the antecedent of a pronoun in discourse involves additional constraints of a syntactic, semantic, and pragmatic nature."

Potential problems:

  • From Laplinn and Leass (1994)[39]: "... assign too dominant a role to coherence and focus in antecedent selection. As a result, they establish a strong preference for intersentential over intrasentential anaphora resolution. This is the case with the anaphora resolution algorithm described by Brennan, Friedman, and Pollard (1987)."
  • Alshawi (1987, p. 62; as cited in Laplinn and Leass, 1994)[39]) : an algorithm/model relying on the relative salience of all entities evoked by a text, with a mechanism for removing or filtering entities whose salience falls below a threshold, is preferable to models that "make assumptions about a single (if shifting) focus of attention."

Mixed models Edit

Combining syntactic, semantic, and discourse factors, etc. Examples: Laplinn and Leass (1994)[39], Asher and Wada (1988), Carbonell and Brown (1988), and Rich and LuperFoy (1988)

Classified based on the construction of coreference chain Edit

See also: Ng (2010)[40], Heng Ji's slide, Marschat and Strube (2015)[22]

TODO: latent-tree models?

To construct a coreference chain, one can consider each elements separately or matching one element candidate to a partial chain. There are 04 major approaches to this problem:

  • Mention-pair model: whether two mentions are coreferential or not
    • (Soon et al. 2001; Ng and Cardie 2002; Ji et al., 2005; McCallum & Wellner, 2004; Nicolae & Nicolae, 2006)
    • Chang et al. (2013)[41]: "We model the task of coreference resolution using a pairwise scorer which indicates the compatibility of a pair of mentions. The inference routine then predicts the final clustering — a structured prediction problem—using these pairwise scores."
  • Entity-mention model: whether a mention and a preceding (partial) cluster are coreferential or not
    • Ref:
      • Pasula et al. 2003[42] <-- citation matching;
      • Luo et al. 2004[43]; Yang et al. 2004, 2008; Daume & Marcu, 2005; Culotta et al., 2007; Lee et al., 2013[9])
    • antecedent trees (Yu and Joachims, 2009[44]; Fernandes et al., 2014[45]; Björkelund and Kuhn, 2014[46])
      • From Fernandes et al.: "We introduce coreference trees to represent mention clusters. A coreference tree isa directed tree whose nodes are the coreferring mentions in a cluster and whose arcs"
  • Mention-ranking model (also called mention-synchronous[47]): which of the preceding mentions is coreferential to a given mention
    • Ref: Denis & Baldridge 2007, 2008
    • Special case: rank two candidate NPs, called tournament model by Iida et al. (2003)[48] and the twin-candidate model by Yang et al. (2003[49]; 2008b[50])
  • Cluster-ranking model: which of the preceding clusters is coreferential to a given mention
    • Ref: Rahman and Ng (2009)[51]
  • Merging clusters (sometimes called entity-centric):
    • From Clark and Manning (2015)[52]: "Our entity-centric “agent” builds up coreference chains with agglomerative clustering. It begins in a start state where each mention is in a separate single-element cluster. At each step, it observes the current state s, which consists of all partially formed coreference clusters produced so far, and selects some action a which merges two existing clusters. The action will result in a new state with new candidate actions and the process is repeated. The model is entity-centric in that it builds"
  • TODO: transition-based approach (somewhat between mention ranking and cluster merging?): Webster and Curran (2014)[53]

From Ng (2010)[40]: "An important issue with ranking models that we have eluded so far concerns the identification of non-anaphoric NPs. As a ranker simply imposes a ranking on candidate antecedents or pre- ceding clusters, it cannot determine whether an NP is anaphoric (and hence should be resolved). To address this problem, Denis and Baldridge (2008) apply an independently trained anaphoricity classifier to identify non-anaphoric NPs prior to ranking, and Rahman and Ng (2009) propose a model that jointly learns coreference and anaphoricity"

Entity-level models are expected to perform better than mention-level models because the former have access to more information. An example from Lee et al. (2013):

As an illustration, the following text shows an example where the incorrect decision is taken if feature sharing is disabled:
This was the best result of a Chinese gymnast in 4 days of competition... It was the best result for Greek gymnasts since they began taking part in gymnastic internationals.
In the example text, the mention-pair model incorrectly links This and It, because all the features that can be extracted locally are compatible (e.g., number is singular for both pronouns). On the other hand, the entity-centric model avoids this decision because, in a previous sieve driven by predicate nominative relations, these pronouns are each linked to incompatible noun phrases, i.e., the best result of a Chinese gymnast and the best result for Greek gymnasts.

Features Edit

See Features for entity coreference resolution

Designing cluster(entity)-level features Edit

From Clark and Manning (2016)[37]: "A long-standing challenge in coreference resolution has been the incorporation of entity-level information -- features defined over clusters of mentions instead of mention pairs."

From Wiseman et al. (2016)[21]: "We believe a major reason for the relative ineffectiveness of global features in coreference problems is that, as noted by Clark and Manning (2015), cluster-level features can be hard to define"


  • Individual: scorecluster(c1, c2) = pooling over all scoremention(m1 in c1, m2 in c2) -- This approach represents the relationship of individual mentions between two clusters. This approach is used in Clark and Manning (2016)[37].
  • Summary: scorecluster(c1, c2) = score(summarize(c1), summarize(c2)) where summarize function iterates through mentions of a cluster and returns a shared representation. -- This approach stress cluster representation while sacrificing the relationship of between-cluster mentions. Though in theory it can still capture such relationship through the summary representation, a big part of information is likely lost.

The crudest of summarization is rules like "some mentions are names", "all mentions are singular", etc. "Early attempts at defining cluster-level features simply applied the coarse quantifier predicates all, none, most to the mention-level features defined on the mentions (or pairs of mentions) in a cluster (Culotta et al., 2007; Rahman and Ng, 2011)." (Wiseman et al. 2016)[21]

On the other extreme, Björkelund and Kuhn (2014)[46] attempts to preserve information by concatenating information found in mentions (e.g. a feature of C-P-P for a cluster that has common noun followed by two pronouns.)

TODO: Wiseman et al. (2016)[21]: "Bjorkelund and Kuhn (2014), Martschat and Strube (2015)[22], Clark and Manning (2015)"

Local vs. global features Edit

From Wiseman et al. (2016)[21]: "we might expect non-local models with access to global features to perform significantly better. However, models incorporating nonlocal features have a rather mixed track record. For instance, Bjorkelund and Kuhn (2014) found that ¨cluster-level features improved their results, whereas Martschat and Strube (2015) found that they did not. Clark and Manning (2015) found that incorporating cluster-level features beyond those involving the precomputed mention-pair and mention-ranking probabilities that form the basis of their agglomerative clustering coreference system did not improve performance. Furthermore, among recent, state-of-the-art systems, mention-ranking systems (which are completely local) perform at least as well as their more structured counterparts (Durrett and Klein, 2014; Clark and Manning, 2015; Wiseman et al., 2015; Peng et al., 2015)."

Pipeline Edit

Preceding tasks Edit

Syntax, NER, etc.

Mention detection Edit

Mention classification? Edit

Anaphoric identification Edit

Ng and Cardie (2002)[54]

Coreference resolution Edit

Creating training examples: problem of class imbalance. Recasens and Hovy (2009)[3] find balancing training instances to be ineffective for TiMBL.

Open-source software and experiments Edit

See also Survey of open-source systems.


[1], Winograd schema.

"advanced model for CR"

See also Edit

References Edit

  1. Hirst, G. (1981). Anaphora in Natural Language Understanding: A Survey. Brown University.
  2. Winograd, T. (1972). Understanding natural language. Cognitive Psychology, 3(1), 1–191.
  3. 3.0 3.1 Recasens, M., & Hovy, E. (2009). A deeper look into features for coreference resolution. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 5847 LNAI(i), 29–42.
  4. McCarthy, J.F., Lehnert, W.G.: Using decision trees for coreference resolution. In: Proceedings of IJCAI. (1995) 1050–1055
  5. Steinberger, J., Poesio, M., Kabadjov, M.A., Jeek, K.: Two uses of anaphora resolu- tion in summarization. Information Processing and Management: an International Journal 43(6) (2007) 1663–1680
  6. Morton, T.S.: Using coreference in question answering. In: Proceedings of the Text REtrieval Conference 8. (1999) 85–89
  7. Soon, Wee Meng, Hwee Tou Ng, and Daniel Chung Yong Lim. "A machine learning approach to coreference resolution of noun phrases." Computational linguistics 27, no. 4 (2001): 521-544.
  8. Hajishirzi, H., Zilles, L., Weld, D. S., & Zettlemoyer, L. (2013). Joint Coreference Resolution and Named-Entity Linking with Multi-pass Sieves. In EMNLP ’13 (pp. 289–299).
  9. 9.0 9.1 9.2 Lee, H., Chang, A., Peirsman, Y., Chambers, N., Surdeanu, M., & Jurafsky, D. (2013). Deterministic Coreference Resolution Based on Entity-Centric, Precision-Ranked Rules. Computational Linguistics, 39(4), 885–916. doi:10.1162/COLI
  10. Martschat, S., & Strube, M. (2014). Recall Error Analysis for Coreference Resolution. Emnlp, 2070–2081.
  11. Olga Uryupina. 2008. Error analysis for learning- based coreference resolution. In Proceedings of the 6th International Conference on Language Re- sources and Evaluation, Marrakech, Morocco, 26 May – 1 June 2008, pages 1914–1919.
  12. Sebastian Martschat. 2013. Multigraph clustering for unsupervised coreference resolution. In 51st Annual Meeting of the Association for Computational Lin- guistics: Proceedings of the Student ResearchWork- shop, Sofia, Bulgaria, 5–7 August 2013, pages 81– 88.
  13. Jonathan K. Kummerfeld and Dan Klein. 2013. Error- driven analysis of challenges in coreference reso- lution. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Process- ing, Seattle,Wash., 18–21 October 2013, pages 265– 277.
  14. Vincent Ng and Claire Cardie. 2002. Improving machine learning approaches to coreference resolution. In Pro-ceedings of the 40th Annual Meeting on Association for Computational Linguistics , pages 104–111.
  15. Aria Haghighi and Dan Klein. 2009. Simple coreference resolution with rich syntactic and semantic features. In Proceedings of the 2009 Conference on Empiri-cal Methods in Natural Language Processing , pages 1152–1161.
  16. Li, Yifan, Petr Musilek, Marek Reformat, and Loren Wyard-Scott. "Identification of pleonastic it using the web." Journal of Artificial Intelligence Research 34 (2009): 339-389.
  17. Mcshane, M., & Babkin, P. (2015). Resolving Difficult Referring Expressions, 1–21.
  18. Kolhatkar, V., & Hirst, G. (2011). Resolving “ This-issue ” Anaphora.
  19. Müller, M.-C. (2008). Fully Automatic Resolution of “it”, “this”, and “that” in Unrestricted Multi-Party Dialog.
  20. Kolhatkar, Varada, Heike Zinsmeister, and Graeme Hirst. "Annotating Anaphoric Shell Nouns with their Antecedents." LAW@ ACL. 2013.
  21. 21.0 21.1 21.2 21.3 21.4 21.5 21.6 Wiseman, Sam, Alexander M. Rush, and Stuart M. Shieber. "Learning Global Features for Coreference Resolution." arXiv preprint arXiv:1604.03035(2016).
  22. 22.0 22.1 22.2 22.3 Sebastian Martschat and Michael Strube. 2015. Latent structures for coreference resolution. TACL, 3:405– 418.
  23. Recasens, Marta, Matthew Can, and Daniel Jurafsky. "Same Referent, Different Words: Unsupervised Mining of Opaque Coreferent Mentions." HLT-NAACL. 2013.
  24. M. Ariel. 1988. Referring and accessibility. Journal of Linguistics, pages 65–87.
  25. J. K. Gundel, N. Hedberg, and R. Zacharski. 1993. Cog- nitive status and the form of referring expressions in discourse. Language, 69:274–307. A.
  26. Denis, P., & Baldridge, J. (2008). Specialized Models and Reranking for Coreference Resolution. Proceedings of the Conference on Empirical Methods in Natural Language Processing, (October), 660–669.
  27. Marjorie McShane, Petr Babkin. 2016. Resolving Difficult Referring Expressions [ PDF]
  28. Cormack, S. 1993. Anaphora Resolution in Discourse Representation Theory. Ph.D. thesis, PhD thesis, University of Edinburgh.
  29. Abraços, J. and J.G. Lopes. 1994. Extending DRT with a focusing mechanism for pronominal anaphora and ellipsis resolution. Proceedings of the 15th conference on Computational linguistics-Volume 2, pages 1128–1132.
  30. Inoue, N., Ovchinnikova, E., Inui, K., & Hobbs, J. (2012). Coreference Resolution with ILP-based Weighted Abduction. In COLING (pp. 1291-1308).
  31. Elango, Pradheep. "Coreference resolution: A survey." University of Wisconsin, Madison, WI (2005). PDF
  32. McCallum, A., & Wellner, B. (2005). Conditional Models of Identity Uncertainty with Application to Noun Coreference. Advances in Neural Information Processing Systems 17, 905–912.
  33. Rizzolo, N., & Roth, D. (2016). Integer Linear Programming for Coreference Resolution. In Anaphora resolution, Theory and Applications of Natural Language Processing (Vol. 11, pp. 315–343).
  34. Poon, H. & Domingos, P. (2008). Joint unsupervised coreference resolution with Markov Logic. In Proceedings of the 2008 Conference on Empirical Methods in Natural Lan- guage Processing, Waikiki, Honolulu, Hawaii, 25–27 October 2008, pages 650–659. 
  35. Bögel, T. & Frank, A. (2013). A joint inference architecture for global coreference clustering with anaphoricity. In Gurevych, I., Biemann, C., & Zesch, T. (Eds.), Language Pro- cessing and Knowledge in the Web, pages 35–46. Berlin, Heidelberg: Springer (Lecture Notes in Computer Science, 8105). 
  36. Clark, K. (2015). Neural Coreference Resolution.
  37. 37.0 37.1 37.2 37.3 Clark, K., & Manning, C. D. (2016a). Improving Coreference Resolution by Learning Entity-Level Distributed Representations. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 643–653.
  38. Clark, K., & Manning, C. D. (2016b). Deep Reinforcement Learning for Mention-Ranking Coreference Models. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP-16), 2256–2262. Retrieved from
  39. 39.0 39.1 39.2 39.3 Lappin, S., & Leass, H. J. (1994). An Algorithm for Pronominal Anaphora Resolution. Computational Linguistics, 20(4), 535–561. Retrieved from
  40. 40.0 40.1 Ng, V. (2010). Supervised Noun Phrase Coreference Research: The First Fifteen Years. ACL ’10, (July), 1396–1411.
  41. Chang, K.-W., Samdani, R., & Roth, D. (2013). A Constrained Latent Variable Model for Coreference Resolution. In EMNLP.
  42. Pasula, H., Marthi, B., Milch, B., … S. R.-A. in neural, & 2003, undefined. (n.d.). Identity uncertainty and citation matching. Retrieved from
  43. Luo, X., Ittycheriah, A., Jing, H., … N. K.-P. of the 42nd, & 2004, U. (2004). A mention-synchronous coreference resolution algorithm based on the bell tree. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics.
  44. Yu, C.-N. J., & Joachims, T. (2009). Learning structural SVMs with latent variables. Proceedings of the 26th Annual International Conference on Machine Learning - ICML ’09, 1–8.
  45. Fernandes, E. R., dos Santos, C. N., & Milidiú, R. L. (2014). Latent Trees for Coreference Resolution. Computational Linguistics, 40(4).
  46. 46.0 46.1 Anders Björkelund and Jonas Kuhn. 2014. Learning structured perceptrons for coreference Resolution with Latent Antecedents and Non-local Features. ACL, Baltimore, MD, USA, June.
  47. For example, in Durrett and Klein (2013).
  48. Ryu Iida, Kentaro Inui, Hiroya Takamura, and Yuji Matsumoto. 2003. Incorporating contextual cues in trainable models for coreference resolution. In Proceedings of the EACLWorkshop on The Compu- tational Treatment of Anaphora.
  49. Xiaofeng Yang, Guodong Zhou, Jian Su, and Chew Lim Tan. 2003. Coreference resolution us- ing competitive learning approach. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pages 176–183.
  50. XiaofengYang, Jian Su, and Chew Lim Tan. 2008b. A twin-candidate model for learning-based anaphora resolution. Computational Linguistics, 34(3):327– 356.
  51. AltafRahman andVincentNg. 2009. Supervisedmod- els for coreference resolution. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 968–977.
  52. Clark, Kevin, and Christopher D. Manning. "Entity-centric coreference resolution with model stacking." Association of Computational Linguistics (ACL). 2015.
  53. Webster, K., & Curran, J. R. (2014). Limited memory incremental coreference resolution. In COLING (pp. 2129–2139).
  54. Ng, V., & Cardie, C. (2002). Identifying anaphoric and non-anaphoric noun phrases to improve coreference resolution. Proceedings of the 19th International Conference on Computational Linguistics -, 1(1987), 1–7.
  55. LEE, HEEYOUNG, MIHAI SURDEANU, and DAN JURAFSKY. "A scaffolding approach to coreference resolution integrating statistical and rule-based models." Natural Language Engineering (2017): 1-30.