Course Outline

Detailed training outline

  1. Introduction to NLP
    • Understanding NLP
    • NLP Frameworks
    • Commercial applications of NLP
    • Scraping data from the web
    • Working with various APIs to retrieve text data
    • Working and storing text corpora saving content and relevant metadata
    • Advantages of using Python and NLTK crash course
  2. Practical Understanding of a Corpus and Dataset
    • Why do we need a corpus?
    • Corpus Analysis
    • Types of data attributes
    • Different file formats for corpora
    • Preparing a dataset for NLP applications
  3. Understanding the Structure of a Sentences
    • Components of NLP
    • Natural language understanding
    • Morphological analysis - stem, word, token, speech tags
    • Syntactic analysis
    • Semantic analysis
    • Handling ambigiuty
  4. Text data preprocessing
    • Corpus- raw text
      • Sentence tokenization
      • Stemming for raw text
      • Lemmization of raw text
      • Stop word removal
    • Corpus-raw sentences
      • Word tokenization
      • Word lemmatization
    • Working with Term-Document/Document-Term matrices
    • Text tokenization into n-grams and sentences
    • Practical and customized preprocessing
  5. Analyzing Text data
    • Basic feature of NLP
      • Parsers and parsing
      • POS tagging and taggers
      • Name entity recognition
      • N-grams
      • Bag of words
    • Statistical features of NLP
      • Concepts of Linear algebra for NLP
      • Probabilistic theory for NLP
      • TF-IDF
      • Vectorization
      • Encoders and Decoders
      • Normalization
      • Probabilistic Models
    • Advanced feature engineering and NLP
      • Basics of word2vec
      • Components of word2vec model
      • Logic of the word2vec model
      • Extension of the word2vec concept
      • Application of word2vec model
    • Case study: Application of bag of words: automatic text summarization using simplified and true Luhn's algorithms
  6. Document Clustering, Classification and Topic Modeling
    • Document clustering and pattern mining (hierarchical clustering, k-means, clustering, etc.)
    • Comparing and classifying documents using TFIDF, Jaccard and cosine distance measures
    • Document classifcication using Naïve Bayes and Maximum Entropy
  7. Identifying Important Text Elements
    • Reducing dimensionality: Principal Component Analysis, Singular Value Decomposition non-negative matrix factorization
    • Topic modeling and information retrieval using Latent Semantic Analysis
  8. Entity Extraction, Sentiment Analysis and Advanced Topic Modeling
    • Positive vs. negative: degree of sentiment
    • Item Response Theory
    • Part of speech tagging and its application: finding people, places and organizations mentioned in text
    • Advanced topic modeling: Latent Dirichlet Allocation
  9. Case studies
    • Mining unstructured user reviews
    • Sentiment classification and visualization of Product Review Data
    • Mining search logs for usage patterns
    • Text classification
    • Topic modelling

Requirements

Knowledge and awareness of NLP principals and an appreciation of AI application in business

  21 Hours
 

Number of participants


Starts

Ends


Dates are subject to availability and take place between 9:30 am and 4:30 pm.
Open Training Courses require 5+ participants.

Testimonials (2)

Related Courses

Smart Robots for Developers

  84 Hours

Related Categories