《Natural Language Processing with PyTorch》 Chapter 2: A Quick Tour of Traditional NLP 笔记

这本书

本章的笔记里面涉及的理论内容在我记录的西湖大学张岳老师的introduction那个笔记里面基本都有提及,不过这篇主要是利用spaCy和NLTK做了一些实践

文章目录

  • 《Natural Language Processing with PyTorch》 Chapter 2: A Quick Tour of Traditional NLP 笔记
  • Corpora(语料库), Tokens, and Types
    • Tokenization
      • 用NLTK、spaCy进行tokenization
        • spaCy
        • NLTK
  • Unigrams, Bigrams, Trigrams, …, N-grams
  • Lemmas and Stems 基本词形和词根
    • lemma
    • stems
  • Categorizing Sentences and Documents
  • Categorizing Words: POS Tagging 词性标注
  • Categorizing Spans: Chunking and Named Entity Recognition
    • Chunking
    • NER
  • Structure of Sentences
  • Word Senses and Semantics 词义和语义
  • Summary
  • 参考资料