Practice

Named Entity Recognition (NER)

You are provided with a set of news articles. Your task is to perform NER on the text and identify different named entities such as people's names, locations, organizations, and dates. Use tokenization, text preprocessing techniques, and NER to complete this task.

Task: Perform Named Entity Recognition on a set of news articles.

Dataset:

SentenceIDSentence
1Kibo is headquartered in New York.
2J.K. Rowling is the author of Harry Potter.
3The Eiffel Tower is located in Paris, France.
4Google's CEO, Sundar Pichai, addressed the audience.
5The river Nile flows through Egypt.
6Microsoft Corporation is based in Redmond, WA.
7William Shakespeare wrote Romeo and Juliet.
8The Great Wall of China is a famous landmark.
9Angela Merkel is the Chancellor of Germany.
10The Amazon River flows through South America.

TODO:

  • Load the news article dataset.
  • Preprocess the text by converting to lowercase, removing punctuation, and trimming white spaces.
  • Tokenize the preprocessed text.
  • Use a pre-trained NER model (e.g., spaCy) to identify named entities.
  • Extract and categorize the identified named entities (e.g., people, locations, organizations).
  • Analyze the frequency of different named entities in the dataset.

Submission

You are required to submit documentation for practice exercises over the course of the term. Each one will count for 1/10 of your practice grade, or 2% of your overall grade.

  • Practice exercises will be graded for completion not perfect correctness.
  • You have to document that you did the work, but we won't be checking if you got it right.
  • You MUST upload your analysis/visuals as a single file to Practice - NLP on Gradescope after the exercise to get the grade for this exercise.

Happy practicing!