Practice

Named Entity Recognition (NER)

You are provided with a set of news articles. Your task is to perform NER on the text and identify different named entities such as people's names, locations, organizations, and dates. Use tokenization, text preprocessing techniques, and NER to complete this task.

Task: Perform Named Entity Recognition on a set of news articles.

Dataset:

SentenceID	Sentence
1	Kibo is headquartered in New York.
2	J.K. Rowling is the author of Harry Potter.
3	The Eiffel Tower is located in Paris, France.
4	Google's CEO, Sundar Pichai, addressed the audience.
5	The river Nile flows through Egypt.
6	Microsoft Corporation is based in Redmond, WA.
7	William Shakespeare wrote Romeo and Juliet.
8	The Great Wall of China is a famous landmark.
9	Angela Merkel is the Chancellor of Germany.
10	The Amazon River flows through South America.

TODO:

Load the news article dataset.
Preprocess the text by converting to lowercase, removing punctuation, and trimming white spaces.
Tokenize the preprocessed text.
Use a pre-trained NER model (e.g., spaCy) to identify named entities.
Extract and categorize the identified named entities (e.g., people, locations, organizations).
Analyze the frequency of different named entities in the dataset.

Submission

You are required to submit documentation for practice exercises over the course of the term. Each one will count for 1/10 of your practice grade, or 2% of your overall grade.

Practice exercises will be graded for completion not perfect correctness.
You have to document that you did the work, but we won't be checking if you got it right.
You MUST upload your analysis/visuals as a single file to Practice - NLP on Gradescope after the exercise to get the grade for this exercise.

Introduction to Data Science

Practice

Named Entity Recognition (NER)

TODO:

Submission

`Happy practicing!`