Resources

Tools

  • Part-of-speech taggers
    • Adwait's MXPOST (adwaittools), MXPOST extras for English (mxpost-extras)
    • Libin's bidirectional POS tagger (bpos)
    • Stanford POS tagger (stanford-tagger)
  • Parsers
    • Dan Bikel's implementation of Mike Collins' parsing model (dbparser)
    • Libin Shen's incremental LTAG-Spinal parser (spinc)
    • Libin Shen's bidirectional LTAG-Spinal parser (binc)
    • Ryan MacDonald's MSTParser (mstparser)
    • Stanford Parser (stanford-parser)
    • Dan Bikel's Switchboard parallelization framework (sb)
    • XTag Tools (xtag)
  • Information Extraction
    • BioTagger (Biotagger)
    • Stanford Named Entity Recognizer (stanford-ner)
  • Tree Search and Manipulation
    • Tregex and TSurgeon (stanford-tregex)
  • Segmentation
    • Stanford Chinese Word Segmenter (stanford-chinese-segmenter)
  • Machine Learning
    • Structlearn (structlearn)
    • MALLET (mallet)
  • Discourse
    • Discourse Connectives Tagger (addDiscourse)

Corpora

  • Penn Treebank (wsj, brown)
  • LTAG-Spinal Treebank (../../tools/move_to_corpora/ltagtb)
  • Penn Discourse Treebank
  • Penn Arabic Treebank
  • Penn Chinese Treebank (ctb)
  • Prague Dependency Treebank (Czech)
  • Prague Czech-English Treebank
  • Prague Arabic Dependency Treebank
  • CHILDES
  • New York Times Annotated Corpus

APIs

  • LTAG-Spinal Java API (spinalapi.jar)

Grammars

  • XTAG Grammar (tools/xtag/english)

Other Resources

  • cmudict pronunciation dictionary (cmudict.0.6)

Datasets

  • Image spam (spam_images)
  • Multidomain Sentiment (sentiment)

Other

  • Dan's statistical significance tester for evalb (compare)
© 2008 Penn Natural Langauge Processing | All Rights Reserved