Course Overview

Welcome to Data Structures and Algorithms 2!

Course Description

This course builds on Data Structures & Algorithms 1. Students will explore non-linear data structures, and implement and analyze advanced algorithms.

The course begins with a brief review of basic data structures and algorithms. Students learn about advanced data structures including heaps, priority queues, and hash tables. Students build on their knowledge of graph theory to implement graph algorithms, and explore topics like finding the shortest paths in graphs, and applications of algorithms in maps, social networks, and a host of real-life applications. Other key topics include: divide and conquer, recursion, greedy algorithms, dynamic programming, computability theory, and case studies in algorithm design.

The course emphasizes big-picture understanding and practical problem-solving in preparation for technical interviews and professional practice. Students will solve common algorithmic problems, and optionally participate in mock interview sessions.

Topics

Hash tables
Heaps and priority queues
State-space search
Greedy algorithms
Graphs
Computability
Divide-and-conquer algorithms
Dynamic programming

How the course works

There are multiple ways you'll learn in this course:

Read and engage with the materials on this site
Attend live class and complete the activities in class
Answer practice exercises to try out the concepts
Complete assignments and projects to demonstrate what you have learned

Active engagement is necessary for success in the course! You should try to write lots of programs, so that you can explore the concepts in a variety of ways.

You are encouraged to seek out additional practice problems outside of the practice problems included in the course.

Learning Outcomes

By the end of the course, students will be able to:

Describe the implementation and performance of fundamental data structures
Describe underlying data structures upon which more complex structures are built
Implement and evaluate runtime of a variety of algorithms, including divide-and-conquer, greedy, and dynamic programming algorithms
Solve common algorithmic technical interview problems

Instructor

Cody Doucette
cody.doucette@kibo.school

Please contact on Discord first with questions about the course.

Live Class Time

Thursdays, 6:00 PM GMT

Office Hours

Fridays, 7:00 PM GMT

Core Reading List

Bible, P. Moser, L. Scarlato, M. (2023). An Open Guide to Data Structures and Algorithms. PALNI.
Miller, B., Ranum, D. (2013). Problem Solving with Algorithms and Data Structures using Python. Franklin Beedle Publishers. Chapters 6-9 (180pp)
Goodrich, M. (2013). Data Structures & Algorithms in Python. Chapters 8-15 (418pp)

Supplemental Reading List

Skiena, S. (2011). The Algorithm Design Manual. Springer; 2nd edition
Skiena, S. (2020) Analysis of Algorithms, videos and Lecture Notes
Roughgarden, T. (2017). Algorithms Illuminated Series. Soundlikeyourself Publishing
Roughgarden, T. (2017). Algorithms 1 (YouTube)
Leetcode. Practice Problems

Live Classes

Each week, you will have a live class (see course overview for time). You are required to attend the live class sessions.

Video recordings and resources for the class will be posted after the classes each week. If you have technical difficulties or are occasionally unable to attend the live class, please be sure to watch the recording as quickly as possible so that you do not fall behind.

Week	Topic	Live Class	Materials
1	Python and Data Structures	YouTube	Slides
2	Hash Tables	YouTube	Slides
3	Priority Queues and Heaps	YouTube	Slides
4	Dynamic Programming	YouTube	Slides
5	Midterm Week	No class
6	Graphs I	YouTube	Slides
7	Graphs II	YouTube	Slides
7	Advanced Algorithms	YouTube	Slides
7	Intractable Problems	YouTube	Slides
10	Final Exam Week	No class

Assessments & Grading

You overall course grade is composed of these weighted elements:

Weekly Quizzes (8 total): 10%
Programming Assignments (5 total): 40%
Midterm Exam: 25%
Final Exam: 25%

Weekly Quizzes

In each week that we have new material, there will be a short quiz in Gradescope on the material covered in the lessons. These quizzes are due shortly before the start of the live class. No late submissions are accepted.

These quizzes will make up 10% of your overall course grade.

Quizzes will be released in Gradescope each week.
Quizzes will be due at the end of each week.
Submit each quiz in Gradescope, then submit a pdf of your quiz in Anchor

Programming Assignments

During some weeks, you'll be given an assignment to test the new skills and concepts you learned. The assignments will be challenging, and it's recommended that you complete the material and practice problems before you start the project each week.

There will be five assignments overall.

Midterm Exam

There will be a midterm exam held during week 5 of the term, covering the material in weeks 1-4.

Final Exam

There will be a final exam held during week 10 of the term, covering the material in weeks 6-9.

Late Policy

Programming assignments can be submitted late under Kibo's late policy:

Submissions within two weeks of the deadline incur a 10% penalty.
Submissions beyond two weeks late incur a 20% penalty.

The final submission deadline is the last day of the academic term.

Note that the late policy only applies to programming assignments. Quizzes must be submitted before the live class each week, and no late submissions will be accepted. The midterm exam and final exam must be taken at the assigned time.

Students facing exceptional circumstances in terms of submitting work on time should immediately contact their Kibo advisor.

Getting Help

If you have any trouble understanding the concepts or stuck on a problem, we expect you to reach out for help!

Below are the different ways to get help in this class.

Discord Channel

The first place to go is always the course's help channel on Discord. Share your question there so that your Instructor and your peers can help as soon as we can. Peers should jump in and help answer questions (see the Getting and Giving Help sections for some guidelines).

Message your Instructor on Discord

If your question doesn't get resolved within 24 hours on Discord, you can reach out to your instructor directly via Discord DM or Email.

Office Hours

There will be weekly office hours with your Instructor and your TA. Please make use of them!

Tips on Asking Good Questions

Asking effective questions is a crucial skill for any computer science student. Here are some guidelines to help structure questions effectively:

Be Specific:
- Clearly state the problem or concept you're struggling with.
- Avoid vague or broad questions. The more specific you are, the easier it is for others to help.
Provide Context:
- Include relevant details about your environment, programming language, tools, and any error messages you're encountering.
- Explain what you're trying to achieve and any steps you've already taken to solve the problem.
Show Your Work:
- If your question involves code, provide a minimal, complete, verifiable, and reproducible example (a "MCVE") that demonstrates the issue.
- Highlight the specific lines or sections where you believe the problem lies.
Highlight Error Messages:
- If you're getting error messages, include them in your question. Understanding the error is often crucial to finding a solution.
Research First:
- Demonstrate that you've made an effort to solve the problem on your own. Share what you've found in your research and explain why it didn't fully solve your issue.
Use Clear Language:
- Clearly articulate your question. Avoid jargon or overly technical terms if you're unsure of their meaning.
- Proofread your question to ensure it's grammatically correct and easy to understand.
Be Patient and Respectful:
- Be patient while waiting for a response.
- Show gratitude when someone helps you, and be open to feedback.
Ask for Understanding, Not Just Solutions:
- Instead of just asking for the solution, try to understand the underlying concepts. This will help you learn and become more self-sufficient in problem-solving.
Provide Updates:
- If you make progress or find a solution on your own, share it with those who are helping you. It not only shows gratitude but also helps others who might have a similar issue.

Remember, effective communication is key to getting the help you need both in school and professionally. Following these guidelines will not only help you in receiving quality assistance but will also contribute to a positive and collaborative community experience.

Screenshots

It’s often helpful to include a screenshot with your question. Here’s how:

Windows: press the Windows key + Print Screen key
- the screenshot will be saved to the Pictures > Screenshots folder
- alternatively: press the Windows key + Shift + S to open the snipping tool
Mac: press the Command key + Shift key + 4
- it will save to your desktop, and show as a thumbnail

Giving Help

Providing help to peers in a way that fosters learning and collaboration while maintaining academic integrity is crucial. Here are some guidelines that a computer science university student can follow:

Understand University Policies:
- Familiarize yourself with Kibo's Academic Honesty and Integrity Policy. This policy is designed to protect the value of your degree, which is ultimately determined by the ability of our graduates to apply their knowledge and skills to develop high quality solutions to challenging problems--not their grades!
Encourage Independent Learning:
- Rather than giving direct answers, guide your peers to resources, references, or methodologies that can help them solve the problem on their own. Encourage them to understand the concepts rather than just finding the correct solution. Work through examples that are different from the assignments or practice problems provide in the course to demonstrate the concepts.
Collaborate, Don't Complete:
- Collaborate on ideas and concepts, but avoid completing assignments or projects for others. Provide suggestions, share insights, and discuss approaches without doing the work for them or showing your work to them.
Set Boundaries:
- Make it clear that you're willing to help with understanding concepts and problem-solving, but you won't assist in any activity that violates academic integrity policies.
Use Group Study Sessions:
- Participate in group study sessions where everyone can contribute and learn together. This way, ideas are shared, but each individual is responsible for their own understanding and work.
Be Mindful of Collaboration Tools:
- If using collaboration tools like version control systems or shared documents, make sure that contributions are clear and well-documented. Clearly delineate individual contributions to avoid confusion.
Refer to Resources:
- Direct your peers to relevant textbooks, online resources, or documentation. Learning to find and use resources is an essential skill, and guiding them toward these materials can be immensely helpful both in the moment and your career.
Ask Probing Questions:
- Instead of providing direct answers, ask questions that guide your peers to think critically about the problem. This helps them develop problem-solving skills.
Be Transparent:
- If you're unsure about the appropriateness of your assistance, it's better to seek guidance from professors or teaching assistants. Be transparent about the level of help you're providing.
Promote Honesty:
- Encourage your peers to take pride in their work and to be honest about the level of help they received. Acknowledging assistance is a key aspect of academic integrity.

Remember, the goal is to create an environment where students can learn from each other (after all, we are better together) while we develop our individual skills and understanding of the subject matter.

Academic Integrity

When you turn in any work that is graded, you are representing that the work is your own. Copying work from another student or from an online resource and submitting it is plagiarism. Using generative AI tools such as ChatGPT to help you understand concepts (i.e., as though it is your own personal tutor) is valuable. However, you should not submit work generated by these tools as though it is your own work. Remember, the activities we assign are your opportunity to prove to yourself (and to us) that you understand the concepts. Using these tools to generate answers to assignments may help you in the short-term, but not in the long-term.

As a reminder of Kibo's academic honesty and integrity policy: Any student found to be committing academic misconduct will be subject to disciplinary action including dismissal.

Disciplinary action may include:

Failing the assignment
Failing the course
Dismissal from Kibo

For more information about what counts as plagiarism and tips for working with integrity, review the "What is Plagiarism?" Video and Slides.

The full Kibo policy on Academic Honesty and Integrity Policy is available here.

Course Tools

In this course, we are using these tools to work on code. If you haven't set up your laptop and installed the software yet, follow the guide in https://github.com/kiboschool/setup-guides.

GitHub is a website that hosts code. We'll use it as a place to keep our project and assignment code.
GitHub Classroom is a tool for assigning individual and team projects on Github.
Visual Studio Code is an Integrated Development Environment (IDE) that has many plugins which can extend the features and capabilities of the application. Take time to learn how ot use VS Code (and other key tools) because you will ultimately save enormous amounts of time.
Anchor is Kibo's Learning Management System (LMS). You will access your course content through this website, track your progress, and see your grades through this site.
Gradescope is a grading platform. We'll use it to track assignment submissions and give you feedback on your work.
Woolf is our accreditation partner. We'll track work there too, so that you get credit towards your degree.

Python and Data Structures Review

Welcome to week 1!

Learning objectives

By the end of this week, you will be able to:

recall the differences between arrays and linked data structures in terms of memory allocation, insertion, and deletion operations.
analyze algorithms for their efficiencies in terms of both time and space.
develop iterative and recursive algorithms that read and modify linked data structures.
apply Python's built-in data structures to efficiently solve problems.

What's due this week

Week 1 Quiz (on Gradescope), due Thursday before class starts
Assignment 1, due Sunday

Data Structures

As we continue our study of data structures and algorithms, let's first think back to some of the concepts we learned in the previous course.

Before continuing our study of data structures and algorithms, let's first think back to what we have previously learned. In the previous data structures course, we learned about a few different linear data structures, including arrays, linked lists, stacks, and queues.

We think of arrays and linked lists as building block data structures; we often use them to implement other data structures with some kind of abstraction, such as stacks and queues.

How we choose to implement data structures, including deciding between using an array or a linked list, depends on what properties we are looking for and what efficiency is required. Check out the video below to remind yourself about these data structures:

To summarize:

Big-O notation provides us with a language to analyze the efficiency of algorithms, both in terms of time and space. Big-O classes include O(1) (constant), O(logn), O(n), O(n²), and O(cⁿ) (in order from fastest to slowest).
An array is a fixed-size collection of items that provides random access to its elements.
A linked list is an extensible collection of items that does not provide random access to its elements.
A queue is a data structure with a first-in, first-out (FIFO) abstraction.
A stack is a data structure with a last-in, first-out (LIFO) abstraction.

Data structures in Python

As we learn about these data structures, we implement them in Python or use existing Python constructs.

Array

Python doesn't have a built-in array data type. However, it does have a list data type, which we use to simulate an array. Under the hood, Python lists are actually implemented as arrays, so we have to be careful and recognize that occasionally operations on a Python list might be expensive (for example, inserting an item in the middle of a list).

Linked list

When we need to use a linked list, we create a special Node class that contains at least a piece of data and a reference to the next node in the list. We then create instances of the Node list and connect them to form a linked list.

For example, here's a portion of the LLString class that used in a previous semester, which includes Node as a nested, inner class and also has a prepend() method to add a node to the head of the list.

class LLString:
    class Node:
        def __init__(self, val, next):
            self.val = val
            self.next = next

    def __init__(self):
        self.head = None

    def prepend(self, new_val):
        new_node = LLString.Node(new_val, self.head)
        self.head = new_node

Queue

Python provides a deque object from its collections library that we can use as a queue data structure. A deque (pronounced "deck", stands for double-ended queue) provides with O(1) enqueue and dequeue operations by using the append() and popleft() methods, respectively:

Stack

The easiest way to simulate a stack in Python is to use a list. To push items onto the stack, we can use the append() method, and the pop items from the top of the stack we can use the (you guessed it) pop() method. Both of these are O(1) algorithms.

Check your understanding

Imagine you are creating an order processing system for an online retailer. Orders are constantly coming into the system, and at some point need to be selected for processing and sent to the warehouse.

The time that it takes for the order to be sent to the processing step is not an especially strict requirement -- it could take minutes or hours without any issue. Therefore, a colleague suggests that it doesn't matter whether a stack or a queue is chosen as the data structure to hold the orders -- either would work well.

However, another colleague suggests that one of those data structures could lead to a situation where orders are starved -- in other words, held indefinitely in the data structure and not processed.

Which colleague is correct? If it's possible for order requests to be starved, which data structure would lead to that scenario?

Click to reveal the answer.

Answer. A stack could theoretically lead to a scenario where orders are kept in the data structure indefinitely. If orders continue to come into the system (as described by the prompt), then they will continue to be added to the top of the stack and then immediately processed. Orders that were added earlier will be at the bottom of the stack, and will only be able to be processed if the system can process orders faster than they arrive.

Arrays vs. Linked Lists

A classic tradeoff in computer science is whether to implement your data structure using an array or a linked data structure. Let's remind ourselves of some of the space and time tradeoffs between the two, since we will continue to use them in building new data structures in the coming weeks.

Watch the following video to remind yourself of the similarities and differences between arrays and linked lists:

To summarize:

Arrays

Allow random (O(1)) access to elements, which makes them amenable to techniques like binary search (since it requires jumping to specific points in the array)
Need to be stored in contiguous memory, so they are typically not resizable
If you need to add more items to the array than its capacity would allow, you would need to create a new, larger array and copy all of the elements over (O(n))

Linked lists

Linked lists don't allow random access to elements; to find an item in the list, you generally must start at the head of the list and traverse as far as needed (O(n))
Doesn't need to be stored in contiguous memory. Instead, individual nodes can be stored anywhere on the memory heap, with pointers connecting nodes. Therefore, linked lists are not fixed-length
Adding to the head or tail of the list is generally O(1)

Check your understanding

You are tasked with building a linear data structure (i.e., either an array or linked list) that is always kept in sorted order. Additionally, you are told that there could be frequent additions to the middle of the list. Which would be the more efficient data structure for the task -- an array or a linked list?

Click to reveal the answer.

Answer. Neither would be more especially more efficient than the other. With an array, you could quickly find the location where a new item was to be added using binary search (O(logn)), but it would take O(n) time to perform the insertion due to the need to shift items. With a linked list, it would take O(n) time to reach the middle of the list, but then only O(1) time to perform the insertion. Therefore, both would generally require O(n) insertion algorithms.

Efficiency and Big-O Notation

With every new data structure and algorithm that we have learned, we have also discussed the efficiency implications. We will continue to do so in this course, and will also learn new ways of evaluating the efficiency of algorithms in terms of time and space.

We have been using big-O notation to describe efficiency. To remind yourself of the basics of big-O notation, watch the video below:

To summarize:

Big-O notation is a measure of an algorithms efficiency
It describes the efficiency as a function of the size of the input, n
When given an expression that represents the number of operations of an algorithm, the corresponding big-O notation can be calculated by ignoring lower-order terms and constants
Other heuristics can be used to derive big-O expressions, such as by analyzing loops
It's important to remember that big-O notation is a simplified way of describing efficiency that focuses on the worst-case behavior of an algorithm; in practice, we care about all cases

Check your understanding

Say that there exists two algorithms, A and B, to solve a problem. Algorithm A is O(n), and algorithm B is O(n²). We know that algorithm A is considered a faster algorithm in general. However, for an input of size n = 1,000,000, does algorithm A necessarily take less time than algorithm B?

Click to reveal the answer.

Answer. No. Remember that big-O notation describes an algorithm's asymptotic running time behavior, i.e., how long it will take to run as the input size grows to infinity. Big-O notation is helpful to describe general classes of efficiency, but for practical uses, constants and lower-order terms do make a difference.

For example, algorithm A may be O(n) but have high constant terms associated with it. Meanwhile, algorithm B may indeed be O(n²), but have relatively lower constant terms, making it reasonably fast for some values of n, such as n = 1,000,000.

Check your understanding

Consider the following function:

def mystery(lst):
    for i in range(len(lst)):
        if lst[i] % 2 == 1: # i is odd
            lst[i] += 1

    total = 0
    for i in range(len(lst)):
        for j in range(i):
            if lst[i] > lst[j]:
                total += 1
    return total

What is the big-O running time of the function?

Click to reveal the answer.

Answer. The first for loop has a running time of O(n), since the loop bound is some function of the size of the list (n). The second loop is O(n²), since the outer loop is O(n) and the inner loop is O(n) (since it depends on the outer loop variable i). Our expression for the big-O running time is therefore O(n) + O(n)*O(n), which is O(n²).

Recursion

Many problems in computer science can be solved by repeatedly breaking down a problem into a slightly smaller problem, since once you know the solution to the smaller problem, it's easier to solve the original problem.

In the previous course, we learned about how to use recursion to solve such problems. We saw that recursion is a natural and elegant way to write algorithms that operate on data structures such as arrays, linked lists, and trees.

Watch the video below to remind yourself about how to think recursively and write a recursive function:

Recursion on data structures

Let's now take a look at the length() function, which uses recursion to find the length of a linked list:

class LinkedList:
    class Node:
        def __init__(self, val, next):
            self.val = val
            self.next = next

    def __length(self, head):
        if head is None:
            return None

        length_of_rest = self.length(head.next)
        return length_of_rest + 1

    def length(self):
        return self.__length(self.head)

This is the style in which we had been writing recursive functions. Notice a few things in particular:

We have a recursive helper method, which uses a parameter head to track where we currently are in the list.
We have a base case to check whether head is None; if it is, we have reached the end of the list.
We make a recursive call, passing in head.next to advance to the next node in the list. We store the return value in a variable length_of_rest.
Once we know the length of the rest of the list we simply need to add one and return the result.

Check your understanding

Consider the following function, which replaces every occurrence of the value old with the value new in the linked list pointed to by head:

def replace(head, old, new):
    trav = head
    while trav is not None:
        if trav.val == old:
            trav.val = new

        trav = trav.next

1. Write a recursive version of this function.

2. What is the running time of your function?

Click to reveal the answer.

Answer. Here is a recursive version of the function:

def replace(head, old, new):
    if head is None:
        return

    if head.val == old:
        head.val = new

    replace(head.next, old, new)

Notice that there isn't any work to be done after the recursive call returns.

The running time of this algorithm is O(n), since it makes n + 1 function calls for a linked list of length n, and each function call performs a constant amount of operations.

Dictionaries and trees

At the very end of the previous course, we left off by describing the concept of data dictionaries -- data structures that are purpose-built for storing and looking-up data. There are many different kinds of data dictionaries, and many forms of search algorithms, including linear search, binary search, and binary search trees.

Linear search

Perhaps the simplest way of implementing a data dictionary would be to store all of our data in an array, and iterate over that array when searching for an item. However, if we don't know anything about the position of a desired item, then we have no choice except to look at all items, which makes linear search an O(n) algorithm.

Binary search

However, we can search much faster if we keep the items of the array in sorted order. If the items are ordered, then we can use binary search to more quickly "jump" to positions in the array. This significantly cuts down on the number of items we need to inspect when searching, which makes binary search an O(logn) algorithm.

However, keeping the items of the array in sorted order comes at a cost. When we need to insert a new item, we need to move all elements that come after the newly-inserted item. This takes O(n) time, which is not ideal.

Binary search trees

In order to achieve O(logn) lookup time in addition to O(logn) insertion time, we can use binary search trees. By using a linked data structure that branches at each junction, we can more quickly perform insertions while still getting the benefits of binary search.

With balanced binary search trees, we can additionally guarantee logarithmic time lookup and insertion algorithms. Otherwise, "standard" BSTs can become unbalanced in some cases, leading to linear time complexities.

What's next?

With linear search, binary search, and binary search trees, our current understanding of the different implementations of data dictionaries at the moment is summarized by this table:

data structure	searching for an item	inserting an item
a list implemented using an array	`O(logn)` using binary search	`O(n)`
a list implemented using a linked list	`O(n)` using linear search	`O(n)`
binary search tree	`O(logn)` in best and average cases; `O(n)` in worst case	`O(logn)` in best and average cases; `O(n)` in worst case
balanced binary search trees (e.g., 2-3 trees)	`O(logn)` in all cases	`O(logn)` in all cases

Logarithmic algorithms are quite fast for data dictionaries, and are certainly a significant improvement over linear algorithms. However, we can actually do even better with the help of hash tables. Believe it or not, we will actually be able to implement search and insertion in O(1) time! This will be the topic of our next week of study.

Week 1 Quiz

Complete this week's quiz in Gradescope by navigating to the appropriate link.
Export it as a pdf using the Google Chrome plugin: https://gofullpage.com/. This plugin will do multiple screen captures while scrolling through your document, and then stitch them into a pdf that you can download.
Upload the generated pdf to Anchor below.

Assignment 1

Python and Data Structures Primer

In this assignment, you will use the data structures knowledge and Python skills that you have gained from previous courses to implement a series of functions.

▶️ Access the assignment on GitHub Classroom

Remember...

Read the instructions carefully
Plan before you code
Debug if you aren't getting the desired output
Attend office hours if you need additional support
Ask for help in Discord

Statement on Usage of AI Tools

The list below details acceptable and unacceptable uses of AI tools in Kibo courses, including generative AI tools like ChatGPT and AI tools integrated into code editors like CoPilot. For use cases not listed below, inquire about their acceptability on the course's Discord Help channel.

Acceptable Uses:

Generating Ideas: Assisting in brainstorming topics for essays, projects, or presentations.
Research Aid: Helping to find sources, papers, or literature for further reading and reference.
Explaining Concepts: Providing explanations of complex concepts in simpler terms or different perspectives.
Outlining: Aiding in creating outlines of topics to build structure for essays or reports.
Drafting Assistance: Creating initial drafts of supplementary writing material where the main artifact being produced is not a written item.
Creative Inspiration: Generating creative ideas for projects in arts, literature, or design.
Technical Guidance: Offering guidance on coding problems, mathematical equations, or scientific principles.
Proofreading: Assisting in grammar and spelling checks for written assignments.

Unacceptable Uses:

AI-Generated Work: Submitting work significantly generated by AI as one's own in essays, reports, code, or other assignments.
Plagiarism: Using AI to rephrase or copy content without proper citation or acknowledgment.
Test and Exam Assistance: Using AI tools during tests or exams unless explicitly allowed by the instructor.
Drafting Assistance: Creating initial drafts of written materials for general educations courses.
Complete Dependence: Relying solely on AI for understanding core concepts without engaging in personal study or critical thinking (AI tools can often be wrong).

Important Notes:

Students must disclose the use of AI tools in their assignments and provide either shareable links or exports (if the tool doesn't support sharing) showing their use of the tools on the assignment.
AI tools might not strictly adhere to instructions or might include unrequested information. Always review the output to ensure accuracy and relevance.
Work generated through AI should be critically reviewed and substantially modified to ensure originality and understanding.

Hash Tables

Welcome to week 2!

Learning objectives

By the end of this week, you will be able to:

trace, compare, and contrast various collision resolution methods, and evaluate their impact on hash table performance.
apply the concept of hash tables to solve problems involving efficient data lookup and storage.
compare the effectiveness of hash tables to other data dictionaries, taking into account time and space complexity.

What's due this week

Week 2 Quiz (on Gradescope), due Thursday before class starts
Assignment 2, due Sunday

Direct-Address Tables

At the end of last week, we discussed the characteristics of various implementations of data dictionaries. The most efficient variant we have seen so far is balanced binary search trees, which guarantee O(logn) lookup and insertion performance. But can we do better?

Ultimately, we will see that hash tables allow us to achieve even better performance than balanced BSTs. However, before we talk about hash tables, we need to discuss direct-address tables.

Direct-address tables

Say we are trying to insert some (key, value) pairs into an initially empty array of length n. If we know that each key is in the range [0, n - 1], then we can simply use the key as the index into the array!

For example, say that our (key, value) pairs represent football players. Each key is a player's jersey number, and each value is the player's name. If we can assume that jersey numbers are in the range 0-99, then we can easily insert all of the players into an array of length 100:

An array of length 100 that holds only three entries: 'Rashidi Yekini' at index 9, 'Jay Jay Okocha' at index 10, and 'John Obi Mikel' at index 21. The remaining indices are empty.

To lookup the name of a player, we can use their jersey number to index the array and find the name in O(1) time.

However, there are a few issues with this approach to storing (key, value) pairs:

In many cases, we don't know the key space, or range of potential keys, ahead of time. For example, the key could be any number, not just a number in the range 0-99.
Even if we knew the range of the keys, we have to make our array large enough to fit the maximum value in the range. In other words, if the maximum possible key value was 1,000,000, we would need to make our array large enough to hold 1,000,000 items, even if we were only expecting 1,000 items in total.

Because of these issues related to the data types and ranges of values of keys, directly using keys to index an array won't work in the general case. However, we can achieve a similar result using hash functions.

Hash Tables

In the previous lesson, we saw that there were multiple issues with trying to index a table directly using keys, including the fact that keys can be any value (including strings!).

To solve this, hash tables provide us with the ability to map a key of any data type to an array index. Watch the two videos below to learn about how hash functions work:

To summarize:

A hash function h(x) maps a key x to a hash value, which is an integer.
A key can be mapped to an array position using the expression h(x) % n, where n is the size of the array.
An array that uses hash functions to manage (key, value) pairs is a hash table.
Ideally, hash functions map keys to integers uniformly at random.
Typically, performing a key lookup in a hash table is a constant time (O(1)) operation.
Since a hash function may map two different keys to the same table position, we need a way to deal with collisions.

Separate chaining

As we previously saw, a hash table can have a collision if two different keys have the same hash value. More formally, we say that two keys $x$ and $y$ collide if $h(x) = h(y)$ where $x \neq y$.

Therefore, we may sometimes run into a situation where we are trying to insert key $y$ into the table, but the key $x$ already occupies the bucket where $y$ belongs! One technique for resolving a collision is known as separate chaining.

Buckets of linked lists

In a separate chaining hash table, the table is represented as an array of buckets, where each bucket contains a linked list of entries:

An array where each element is a linked list. At index 0, there is a linked list of length 2 with the keys ant and alligator. At index 1, there is no linked list (it is empty). At index 2, there is a linked list of length 3 with the keys cat, cow, and camel. At index 3, there is a linked list of length 1 with the key dog. The rest of the table is implied, until index 25 where there is a linked list of length 1 with the key zebra.

Now, two (or more) keys that have the same hash value are inserted into the corresponding bucket's linked list. For example, cat, cow, and camel are all hashed to the bucket at index 2 and are therefore stored in a linked list.

Note: from this point on, for simplicity our hash table diagrams will only show the keys that are used to organize the table. However, don't forget that typically, a hash table is used to store (key, value) pairs as a data dictionary.

Building a hash table

Watch the following video to see how to build a separate chaining hash table from a list of keys:

Analyzing separate chaining

Previously, we saw that in many cases, looking up a key in a hash table is O(1). However, collisions complicate the lookup operation. In separate chaining, when we hash a key to a bucket, we then need to walk down the linked list in that bucket to find the key (if it's present).

Worst case analysis

In the worst case, all $n$ keys that are in the table have been hashed to the same bucket, creating a linked list in that bucket that is n nodes long. Looking up a key in this list would therefore take time O(n) in the worst case, since we would need to start at the beginning of this list and walk down it until we find the key we're looking for.

Average case analysis

But what about the average case? Is it typically closer the best case O(1) performance, or closer to the worst case O(n) performance? To answer this question, we first need to define the load factor of a hash table.

The load factor ($\alpha$, "alpha") of a hash table is the main way to measure how full the hash table is. It is defined as the ratio of the number of entries in the table ($n$) to the overall number of buckets in the table ($m$):

$\alpha = n/m$

For example, the load factor of a hash table of length 100 with 50 items in it is $\alpha = 0.5$.

When looking up a key in a separate-chaining hash table, the average length of the linked lists is the load factor, since it represents $n$ elements spread out over $m$ slots. Therefore, you can think of the average case lookup performance in a separate chaining hash table as O(n/m).

However, as long as the load factor is kept < 1 (i.e., the table is not too full), the time complexity is generally better than O(logn), and approaches O(1).

This is yet another classic time-space tradeoff in computer science. Larger hash tables take up more memory, but have a lower load factor, which means that the linked lists in buckets are shorter, which leads to faster lookup times. Smaller hash tables conserve space, but have longer linked lists and therfore longer lookup times.

Open Addressing I

Separate chaining is just one technique for resolving collisions in hash tables. The other main technique is known as open addressing.

In open addressing, the keys are stored directly in the entries of the hash table (not in buckets of linked lists). If a collision occurs when trying to insert a new key, a different, open entry in the table must be found instead. The process of finding an open entry is known as probing, and we'll consider three different techniques: linear probing, quadratic probing, and double hashing.

Linear probing

In linear probing, whenever a collision occurs in the hash table, we look sequentially in the table for the next open entry, one index a time. In other words, as long as there are collisions, we continue to look for the next empty cell according to this sequence (wrapping around as needed):

h(x), h(x) + 1, h(x) + 2, h(x) + 3, ...

Watch the following video to see an example of linear probing:

If there is an open cell in the table (and if the table size is a prime number), linear probing will eventually find it. The disadvantage, however, is that linear probing doesn't tend to spread keys throughout the table very well. It can lead to clusters of occupied cells, which leads to longer subsequent probes when looking up values.

To try to correct this behavior, we can consider quadratic probing.

Quadratic probing

Quadratic probing skips increasingly larger sequences of cells when looking for an open position. It uses the following sequence of probes (wrapping around as necessary):

h(x), h(x) + 1², h(x) + 2², h(x) + 3², ...

Watch the following video to see an example of quadratic probing:

The advantage of quadratic probing is that it spreads the keys of the table further apart, which creates smaller clusters of occupied cells compared to linear probing. However, the disadvantage is that quadratic probing is not necessarily guaranteed to find an open cell. Because the probe sequence increases quadratically, there is no mathematical guarantee that we will eventually land on an empty cell.

Therefore, we'll consider one more probing scheme in the next lesson: double hashing.

Open Addressing II

The final probing technique for resolving collisions that we'll consider is double hashing, which actually uses two different hash functions.

Double hashing

In double hashing, we use two hash functions instead of one. When there is a collision when using our first hash function, h1(x), we'll add in increasingly larger factors of our second function, h2(x). The probe sequence is as follows:

h1(x), h1(x) + 1*h2(x), h1(x) + 2*h2(x), h1(x) + 3*h2(x), ...

Because we are repeatedly adding in multiples of h2(x), we must pick an h2 that can never output 0 -- otherwise, the probe sequence would not change.

Watch the video below to see how double hashing uses two different hash functions to find an empty space in the table:

With double hashing, we get the best of both worlds: a probing scheme that spreads the keys throughout the table like quadratic probing does, and is also guaranteed to find an open position (if one exists, and the table size is a prime number) like linear probing does.

Check your understanding

Consider the following open addressing hash table:

'anaconda'

'bat'

'bison'

'elephant'

Assume that the hash function h(x) = index related to the first character of x.

What would the sequence of probed positions be when trying to look up 'bear' using linear probing?
What would the sequence of probed positions be when trying to look up 'elephant' using quadratic probing?
What would the sequence of probed positions be when trying to look up 'ant' using double hashing with h2(x) = length of x?

Click to reveal the answer.

Answer.

h('bear') = 1, which is occupied but not the key we're looking for, so we try h('bear') + 1 = 2 which is also occupied but not the key we're looking for. We then find an empty position at h('bear') + 2 = 3, so we then know that 'bear' is not in the table.
h('elephant') = 4, which is occupied with the value we're looking for, so the lookup is complete. For the record, if it had been occupied with a different value, we would have resumed the search at h('elephant') + 1² = 5 % 5 = 0, which would also have been occupied with something other than the key we're looking for, so we would have needed to continue searching.
h('ant') = 0, which is occupied but not the key we're looking for, so we try h('ant') + 1*h2('ant') = 0 + 3 = 3 which is empty. Therefore, we know that 'ant' is not in the table.

Analysis of open addressing

In many cases, looking up a key in an open addressing hash table is O(1), especially if there are no collisions. However, when there are collisions, we need to utilize one of the probing algorithms, which can inflate the time complexity.

Worst case analysis

In the worst case, the table is full and the key we are looking for (or inserting) is not yet in the hash table. In that case, the probing algorithms will lead us to probe all $$m$$ slots in the table to ensure that the key is not present, which is O(m). This is because in the worst case, we hash a key to a slot in the hash table, but as we probe through the table, we continue to find full slots until we have checked every slot.

Average case analysis

In the average case, when hashing a key to a slot in the table, we might encounter some small number of collisions during the probing algorithm. However, as long as the load factor is not too high (say, $$\alpha < 1/2$$), we will be able to find the key (or be sure it's not present) within a constant number of probes. Therefore, similar to separate chaining, under decent conditions (a good hash function and a table that's not too full), we can expect performance that approaches O(1).

Hash Deletion

In the past few lessons, we covered how to insert new items into separate chaining and open addressing hash tables. Let's also now take a look at how to delete an item from a hash table.

Separate chaining

Deleting a key x is relatively simple in a separate chaining hash table. The algorithm is:

Compute h(x) to find the bucket where x belongs.
Use the linked list deletion algorithm to delete the node with key x.

Open addressing

Deletion in an open addressing hash table is a little bit more challenging, since it can leave gaps in the table that interfere with the probing process. Watch the following video to see why deletion can create gaps, and how to fix it:

To summarize:

When deleting an item, assign a special "removed" value to that slot in the table.
- For example, "empty" spaces may contain -1 and "removed" spaces may contain -2 as sentinel values
When performing lookups, we need to skip past any removed slots and continue searching
When performing insertions, we can insert into either empty or removed slots (whichever comes first), after checking to make sure the key to be inserted doesn't already exist

Check your understanding

Consider the following open addressing hash table with linear probing and removed spaces:

The hash function used is h(x) = index of first character of x, i.e., 'a' = 0, 'b' = 1, etc.

One of the keys has been incorrectly inserted. Which one?
Where would the key giraffe be inserted?

Click to reveal the answer.

Answer.

The key dog must have been incorrectly inserted. It starts with 'd', so it should have been hashed to 3. Even though another key occupies that space, with linear probing it should have been placed in index 4.
h(giraffe) = 6, so we first check there. It's a removed cell, so we need to keep looking through the hash table to confirm giraffe isn't already present. We wrap around to the beginning of the table, and check indexes 0 (occupied), 1 (removed), 2 and 3 (both occupied), until we reach index 4, which is empty. At this point, we can be assured that giraffe is not in the table, and so we insert it into index 6, since that was the first removed position we encountered.

Example Hash Function

Throughout these lessons, we have been using relatively simple hash functions that just correspond to the length of a string or the index of the string's first character. However, these aren't necessarily good hash functions for use in actual applications. Remember that hash functions should (more or less) map keys to hash values uniformly at random -- we wouldn't want every word that starts with 'a' to be mapping to the same hash value!

The following video shows how to define a hash function that can be used on strings:

Applications

Hash functions are critical for hash tables as data structures, but they are also a more general mechanism and useful for a variety of applications, especially for security. Optionally, check out the following videos about how hash functions are used in various contexts.

Note: these are not applications of hash tables, just applications of hash functions.

Hash functions and file authenticity

Hash functions and passwords

Hash functions and cryptographic ledgers (blockchain)

Reading

Read about hashing and hash tables in Chapter 7 of An Open Guide to Data Structures and Algorithms.

Practice

Here is a selection of practice problems from LeetCode involving hash tables (in Python, dictionaries):

Week 2 Quiz

Complete this week's quiz in Gradescope by navigating to the appropriate link.
Export it as a pdf using the Google Chrome plugin: https://gofullpage.com/. This plugin will do multiple screen captures while scrolling through your document, and then stitch them into a pdf that you can download.
Upload the generated pdf to Anchor below.

Assignment 2

Hash Tables

In this assignment, you will complete a series of exercises that test your knowledge of how to use and build hash table algorithms.

▶️ Access the assignment on GitHub Classroom

Remember...

Read the instructions carefully
Plan before you code
Debug if you aren't getting the desired output
Attend office hours if you need additional support
Ask for help in Discord

Statement on Usage of AI Tools

Acceptable Uses:

Generating Ideas: Assisting in brainstorming topics for essays, projects, or presentations.
Research Aid: Helping to find sources, papers, or literature for further reading and reference.
Explaining Concepts: Providing explanations of complex concepts in simpler terms or different perspectives.
Outlining: Aiding in creating outlines of topics to build structure for essays or reports.
Drafting Assistance: Creating initial drafts of supplementary writing material where the main artifact being produced is not a written item.
Creative Inspiration: Generating creative ideas for projects in arts, literature, or design.
Technical Guidance: Offering guidance on coding problems, mathematical equations, or scientific principles.
Proofreading: Assisting in grammar and spelling checks for written assignments.

Unacceptable Uses:

AI-Generated Work: Submitting work significantly generated by AI as one's own in essays, reports, code, or other assignments.
Plagiarism: Using AI to rephrase or copy content without proper citation or acknowledgment.
Test and Exam Assistance: Using AI tools during tests or exams unless explicitly allowed by the instructor.
Drafting Assistance: Creating initial drafts of written materials for general educations courses.
Complete Dependence: Relying solely on AI for understanding core concepts without engaging in personal study or critical thinking (AI tools can often be wrong).

Important Notes:

Students must disclose the use of AI tools in their assignments and provide either shareable links or exports (if the tool doesn't support sharing) showing their use of the tools on the assignment.
AI tools might not strictly adhere to instructions or might include unrequested information. Always review the output to ensure accuracy and relevance.
Work generated through AI should be critically reviewed and substantially modified to ensure originality and understanding.

Priority Queues and Heaps

Welcome to week 3!

Learning objectives

By the end of this week, you will be able to:

Describe how the heap structure allows efficient insertion and deletion of elements while maintaining its ordering property.
Trace, compare, and contrast heapsort with other sorting algorithms.
Implement solutions to problems by applying priority queues and heaps.

What's due this week

Week 3 Quiz (on Gradescope), due Thursday before class starts

Priority Queues

Now that we've finished talking about data dictionaries in the form of search trees and hash tables, let's turn our attention to a new kind of ADT: the priority queue.

A priority queue is a data structure where each element has an associated priority. The priority of an element determines the order in which the element will be removed from the queue. In a max priority queue, the element with the maximum priority would be the first to be removed, the element with the next maximum priority would be removed second, etc. For example:

Three objects getting inserted into an initially empty priority queue: order #4300 (priority 20), order #4301 (priority 15), and order #4302 (priority 25). Once inserted, order #4302 is at the top of the priority queue, followed by order #4300, followed by order #4301.

There is also a min priority queue variation, where the element with the minimum priority would be the first to be removed, etc.

Priority queues are useful for a number of applications, including:

Scheduling a shared resource, such as the CPU: give some processes a higher priority so that they will be scheduled first
Huffman encoding: at every step of forming the Huffman tree, you need to know the next two nodes with the highest frequencies; this could be implemented using a priority queue.
Traffic prioritization: when network bandwidth is limited, packets are prioritized according to their priority, and higher priority packets are sent first

Operations of a PQ

The priority queue ADT generally calls for the following operations:

insert(item, priority): insert an item with a given priority into the queue
remove(): remove the maximum priority item from the queue
peek(): return the maximum priority element (and/or the priority itself) without actually removing the element

Notice that these operations are very similar to the operations of the queue and stack ADTs that we learned about in a previous course. Indeed, a priority queue is a generalization of the concepts of queues and stacks -- it just makes the priority of each element explicit. In a stack, the priority of an element is related to how recently it was added -- more recently added items have higher priorities.

Implementing a PQ

There are various ways that we could implement the PQ ADT using the data structures we have at our disposal. Here are some options:

Unsorted array or linked list

With an unsorted array or linked list, every time the insert() operation is called, we could just append the new item onto the end of the data structure. Generally speaking, this should take only O(1) time, although in the case of an array we might need to occasionally resize the array.

However, when we go to remove() the element with the maximum priority, we would need to do an O(n) pass over the array or linked list to (a) find the element and (b) remove it, closing any gaps in the elements by shifting as needed. Even though insertion is very efficient, removal is quite inefficient.

Sorted array or linked list

Perhaps if we maintain a sorted list, then we could do better! If we kept an array or linked list sorted by the priorities, then we could always know that the maximum priority is always at the beginning of the list (or end of the list, depending on how you want to implement it). We could therefore easily implement the remove() operation in O(1) time.

Linear data structures seem to always require O(n) time, one way or another. Could we do better with a binary search tree?

Balanced BST

If we use a balanced BST, with priorities as the keys and items as the values, then we could implement the priority queue in logarithmic time: it takes time O(logn) to insert a new (priority, item) pair. Removal also takes O(logn) time, since although we could always maintain a reference to the maximum priority in the tree for quick access, the operation of removing requires re-balancing the tree, which takes time O(logn). While this is better in some ways than the O(n) time required by the linear data structures, it's still not ideal.

Instead, what we need is a new abstraction: a heap.

Heaps

A heap is a type of tree data structure that satisfies the heap property: every node's key is greater than or equal to the keys of its children.

Here are some example heaps:

Three max-at-top heaps of various shapes.

Note that with the heap property, it necessarily means that the maximum value in the heap is at the root of the tree. This makes it a very natural data structure to use when implementing a priority queue, since the highest priority element is readily available!

The most common way of implementing a heap is by using a complete binary tree.

Complete binary tree

A complete binary tree (note: not a binary search tree) is a binary tree of height h where:

levels 0 through h - 1 are fully occupied
in level h, there are no "gaps" to the left of any node

In essence, a complete tree is one where the tree is filled with nodes from top to bottom, left to right, and each node is full of nodes except for possibly the last layer. Here are some examples of complete trees:

And here are some examples of trees that are not complete (where the node that's missing to prevent it from being complete is highlighted):

Three trees of various shapes that are close to being complete trees, but are missing a node that prevents it from being complete.

Representing a complete binary tree

In our previous data structures course, we represented trees a linked data structure: every node in a tree had some data element, as well as pointers to its left child, right child, and parent. This works great in the general case for representing trees.

However, when using complete trees (as we are for heaps), we can actually have a simpler representation that just uses an array. This makes it a lot simpler to manage and manipulate the heap.

To represent a complete tree as an array, the trees's nodes are stored in the order given by a level-order traversal; top to bottom, left to right:

A complete tree showing how it can be interpreted using an array. The root node shows a[0] and its left and right children show a[1] and a[2], respectively. The node with a[1] has a[3] and a[4] as its left and right child, etc.

Side-by-side comparison of a complete tree and its array representation.

Once the tree is represented as an array (or in our case, a list in Python), you can use the following formulas to navigate the tree:

The root node is in lst[0]
Given the node in lst[i]:
- its left child is in lst[2*i + 1]
- its right child is in lst[2*i + 2]
- its parent is in lst[(i - 1)/2] (using integer division)

For example, consider this complete tree:

Another complete tree where each node is marked with the corresponding array indexes from the array representation.

the left child of the node in lst[1] is in lst[2*1 + 1] = lst[3]
the left child of the node in lst[2] is in lst[2*2 + 1] = lst[5]
the right child of the node in lst[3] is in lst[2*3 + 2] = lst[8]
the parent of the node in lst[4] is in lst[(4 - 1)/2] = lst[1]

Summary

To summarize, we are now looking to implement the priority queue ADT using a max-at-top heap. Most often, heaps are complete trees, which are implemented as arrays due to the simplicity of the representation.

Watch the following video to see a summary of heaps and complete trees:

Heap Insertion

Let's talk about how to insert items into a heap. To do so, let's define a Heap class that we can add functionality to, and start it with a list of items and a counter for the current number of items:

class Heap:
    def __init__():
        self.contents = []
        self.num_items = 0

Recall that since heaps are complete trees, they have a convenient representation as a list. Therefore, we'll avoid creating a linked data structure and instead implement the heap using a list.

Algorithm

Watch the video below to see how to insert an item into a heap:

To summarize:

The new item initially gets put into the end of the list (a leaf node).
The value in that leaf node is "sifted up" in the heap until it obeys the heap property.

Efficiency

When inserting an item into the heap, we first place the item in the last position in the list in constant time. However, we then need to run the sift_up() algorithm, which in the worst case pushes the item from a leaf node all the way up to the root.

Since the heap is a complete tree, we know that it is also balanced, and therefore we can say that in the worst case, an item is sifted up throughout the height of the tree, where h = O(logn). Each of the swaps that occurs during the sift operation takes constant time.

Therefore, insertion in a heap is O(logn).

Now that we know how to insert items into a heap, let's take a step back and look at the overall heap constructoin algorithm and its efficiency. The big-O notation for building a heap may surprise you!

Heap Removal

The remove() operation of a heap is extremely important -- usually, the reason why we keep the items in a heap is so that we can efficiently remove them when ready.

The removal algorithm is quite similar to the insertion algorithm, in the sense that we will need to do some sifting to keep the heap in order. However, this time we'll be sifting down instead of sifting up.

Algorithm

Watch the video below to see how to remove the maximum item from a heap:

To summarize, when removing the maximum item from the heap, we can:

Make a copy of the largest item
Move the last item in the heap to the root
"Sift down" the new root item until it is >= its children (or it's a leaf)
- Note: when a parent is less than both of its children, we need to swap the parent with the greater of the two children.
Return the largest item

Efficiency

Rather than discuss the efficiency explicitly, let's put your analytical skills to the test:

Heap Construction

Say that we have a list of numbers that we want to convert into a heap:

[15, 23, 39, 14, 8, 45, 18, 27]

To build a heap, we could take this list of random numbers and insert them, one at a time, into an initially empty heap. We know from the previous lesson that each insertion takes time O(logn), and if there are n elements, then in the worst case this process takes time n elements * O(logn) per insertion = O(nlogn).

However, this algorithm is not optimal. We can actually do better than O(nlogn)!

Linear-time heap construction

An alternative method to building a heap is to start with all of the elements we want to insert in a complete tree. For example, we would represent the above list as the following complete tree:

A complete tree with the following list added in level order: [15, 23, 39, 14, 8, 45, 18, 27].

Then, we use the following algorithm:

Find the parent of the last child node.
Sift down the parent as much as needed to obey the heap property. Remember that if both children are greater than the parent, the parent should be swapped with the greater of the two.
Proceed to the rest of the nodes in the same level, from right to left. Sift them down as needed.
Proceed to the next level up in the complete tree, moving from right to left and sifting them down as needed.
Continue through the rest of the remaining levels.

Watch the following video to see an example of this algorithm:

Check your understanding

Consider the following list representation of a complete tree:

Build a heap from this list using the algorithm above.

Click to reveal the answer.

Answer. Interpreting the list as a complete tree, we get the following representation:

The parent of the last leaf node is the 24, which is greater than both of its children so it doesn't need to be sifted. We next consider the 32. The 47 is greater than 32, so the 32 must be sifted down. We next consider the 12. The 24 is greater than 12, so the 12 must be sifted down. However, the 12 is greater than both of its children (the 10 and the 9), so it can stay where it is. Finally, we consider the 26, which is less than both of its children. The largest child is the 47, so we swap the 26 and the 47. We then sift the 26 down further, swapping the 26 and the 32. When finished, the heap looks like this:

This algorithm actually works in O(n) time! But how?

Explanation of efficiency

Let's make an informal argument for why this heap construction method only takes linear time.

Half the nodes are leaf nodes

First, let's think about how many nodes there are in each level of a complete tree:

Level	Number of nodes
`0`	`1`
`1`	`2`
`2`	`4`
`3`	`8`
...	...
`h`	`2^h`

Also, a complete tree of height h has 2^{(h + 1)} - 1 nodes total. For example, a tree of height 2 (which has three levels) has 7 nodes total: 1 in the first level, 2 in the second level, and 4 in the third level.

Therefore, it stands to reason that out of the 2^{(h + 1)} - 1 total nodes in a tree, 2^h of them are in the last level (leaf nodes). For example, in a tree of height 4, there are 31 total nodes and 16 of them are leaf nodes!

This means that in a complete tree, roughly half of the nodes are leaf nodes. This is important because in the heap construction algorithm, we skip all of the leaf nodes -- we don't need to consider them for sifting down. We start with the parent of the last leaf node.

Most nodes are not sifted far

We showed above that approximately half (~n/2) of the nodes don't need to be sifted at all because they're leaf nodes. Proceeding with that logic, there are ~n/4 parents of leaf nodes, and although they have to be considered for sifting, they are only sifted at most one level (down into a leaf node).

In fact, the further you work your way up in the tree, there are fewer and fewer nodes that need to be sifted throughout the height of the tree. The root node is the only node that needs to potentially be sifted throughout the full height h of the tree!

In other words, the time that you spend sifting a few elements throughout the height of tree is offset by the many nodes that are sifted only a small number of levels or are not sifted at all. Although we won't get into the mathematical argument for this, it turns out that by this argument, the running time of heap construction is O(n)!

Code for heap construction

Up to this point, we have been using a definition of the Heap class that has a fairly simple constructor:

class Heap:
    def __init__():
        self.contents = []
        self.num_items = 0

However, we could also use a version of the constructor that accepts a list contents as a parameter, and builds a heap based on that list. For example:

    def __init__(self, contents):
        self.contents = contents
        self.num_items = 0
        self.__make_heap()

    def __make_heap(self):
        if len(self.contents) == 0:
            return
        last = len(self.contents) - 1
        parent_of_last = (last - 1) // 2
        for i in range(parent_of_last, -1, -1):
            self.__sift_down(i)

The __make_heap() method implements the algorithm described above for starting with the parent of the last leaf node, and repeatedly sifting down each element as needed in order to construct a heap.

In case it's helpful, here's an entire implementation of the Heap class for your reference:

class Heap:
    def __init__(self, contents):
        self.contents = contents
        self.num_items = len(contents)
        self.__make_heap()

    def __make_heap(self):
        if len(self.contents) == 0:
            return
        last = len(self.contents) - 1
        parent_of_last = (last - 1) // 2
        for i in range(parent_of_last, -1, -1):
            self.__sift_down(i)

    def insert(self, item):
        self.contents.append(item)
        self.__sift_up(self.num_items)
        self.num_items += 1

    def remove(self):
        if self.num_items == 0:
            return None

        to_remove = self.contents[0]
        self.contents[0] = self.contents[self.num_items - 1]
        self.contents.pop()
        self.num_items -= 1
        if self.num_items > 0:
            self.__sift_down(0)
        return to_remove

    def is_empty(self):
        return self.num_items == 0

     # to_string - create a string representation of the heap of the form
     # { ( root ) ( items in level 1 ) ( items in level 2 ) ... }
    def to_string(self):
        string = '{ '

        start = 0
        level_size = 1
        while start < self.num_items:
            # print all of the items at the current level of the tree
            string += '( '
            i = start
            while i < start + level_size and i < self.num_items:
                string += (self.contents[i] + ' ')
                i += 1
            string += ') '

            # move down to the next level
            start += level_size
            level_size *= 2

        string += '}'
        return str

    def __sift_down(self, i):
        # Store a reference to the element being sifted.
        to_sift = self.contents[i]

        # Find where the sifted element belongs.
        parent = i
        child = 2 * parent + 1
        while child < self.num_items:
            # If the right child is bigger, compare with it.
            if child < self.num_items - 1 and self.contents[child] < self.contents[child + 1]:
                child = child + 1

            # Check if we're done.
            if to_sift >= self.contents[child]:
                break

            # If not, move child up and move down one level in the tree.
            self.contents[parent] = self.contents[child]
            parent = child
            child = 2 * parent + 1

        self.contents[parent] = to_sift

    def __sift_up(self, i):
        # Store a reference to the element being sifted.
        to_sift = self.contents[i]

        # Find where the sifted element belongs.
        child = i
        while child > 0:
            parent = (child - 1) // 2

            # Check if we're done.
            if to_sift <= self.contents[parent]:
                break

            # If not, move parent down and move up one level in the tree.
            self.contents[child] = self.contents[parent]
            child = parent

        self.contents[child] = to_sift

Heapsort

Back in our previous data structures course, we talked about various sorting algorithms, including selection sort, insertion sort, mergesort, and radix sort. Here's a summary of the efficiencies of those algorithms:

Algorithm	Best case	Worst case	Space
Selection sort	`O(n²)`	`O(n²)`	`O(1)`
Insertion sort	`O(n)`	`O(n²)`	`O(1)`
Mergesort	`O(nlogn)`	`O(nlogn`	`O(n)`
Radix sort	`O(m*n)`	`O(m*n)`	`O(n)`

We were able to define fairly efficient algorithms in mergesort and radix sort, but both of those algorithms require O(n) extra space to work. Can we do better?

Improving selection sort

Recall selection sort. It repeatedly finds the next smallest element in a list, and swaps it into place. It isn't a particularly efficient algorithm though, since for each of O(n) elements, it performs an O(n) pass over the list to find the next smallest element, leading to a running time of O(n²).

However, we now have a data structure that allows us to efficiently find the next smallest (or largest) item: a heap! Indeed, we will use a heap to implement heapsort, which will repeatedly find the next largest item in a list and swap it into its final sorted position.

Algorithm

Say that we start with an arbitrary list of numbers. We can use heapsort to sort the list with the following high-level algorithm:

Turn the list into a heap.
Remove the maximum item from the heap and place it at the end of the list. This element is now in its final sorted position. Re-heapify the heap.
Repeatedly remove the maximum item from the heap and place it in its final sorted position until all elements have been removed.

Watch the following video to see how heapsort works:

Practice

Consider the following list:

[20, 18, 32, 3, 11, 7, 17]

Sort the list using heapsort. Show the evolution of the list as it changes in memory.

Pseudocode for heapsort

Here is some pseudocode for the operation of heapsort. It's actually close to working Python code, but it won't quite work with the implementation of the Heap class that we made, since remove() actually pops the element from the list (as opposed to just placing it at the end and keeping it there).

def heapsort(lst):
    # Create a new heap from the given list.
    heap = Heap(lst)

    end_unsorted = len(lst) - 1
    while end_unsorted > 0:
        # Get the largest remaining element and put it
        # at the end of the unsorted portion of the array.
        largest_remaining = heap.remove()
        lst[end_unsorted] = largest_remaining
        end_unsorted -= 1

A fun challenge could be to make this version of heapsort() work with the Heap class that we have given you, which would involve keeping the list the same length even as elements are removed.

Other resources

Here are some other resources for heapsort that you might find helpful:

Heapsort Analysis

Now that we've seen how heapsort operates, let's think about how efficient it is.

Time analysis

We saw in the previous lesson that heapsort consists of two main steps:

Convert an arbitrary list into a heap.
Repeatedly remove the maximum element from a heap until you get a sorted list.

As we saw in the heap construction lesson, step (1) above takes time O(n). After this step is done, we then repeatedly remove the maximum element. In fact we remove n - 1 = O(n) such elements, and each call to remove() takes time O(logn) in order to put the heap back into order. Therefore, the running time of the second step is O(n) * O(logn) = O(nlogn).

Putting the two steps together, we get O(n) + O(nlogn) = O(nlogn). This is just as efficient as mergesort!

Space analysis

Although heapsort has the same running time as mergesort, it actually performs better in terms of space. Whereas mergesort requires an auxiliary list to be able to work (which means O(n) extra space), the heapsort algorithm can be performed in-place, using only a constant amount of extra space for local variables. Notice that in the visualizations of heapsort shown in the previous lesson, all the work is done in the same list that was passed in!

Completing our table of sorting algorithm analysis, we can now add heapsort:

Algorithm	Best case	Worst case	Space
Selection sort	`O(n²)`	`O(n²)`	`O(1)`
Insertion sort	`O(n)`	`O(n²)`	`O(1)`
Mergesort	`O(nlogn)`	`O(nlogn)`	`O(n)`
Radix sort	`O(m*n)`	`O(m*n)`	`O(n)`
Heapsort	`O(nlogn)`	`O(nlogn)`	`O(1)`

heapq Library

So far, we've built our own Heap class to illustrate the basic algorithms behind heaps. However, there are heap libraries available for you to use in Python as well.

One option is the heapq module, which provides functionality for a min-at-top heap. In this lesson, we'll cover the three main functions of heapq: heappush(), heappop(), and heapify().

heappush()

The heappush() function adds an element to the heap while maintaining the heap invariant. For example:

import heapq

heap = []
heapq.heappush(heap, 4)
heapq.heappush(heap, 1)
heapq.heappush(heap, 7)
print(heap)  # Output: [1, 4, 7]

Notice that the "heap" is just a list, but the heapq library manages the list to enforce heap properties. After adding the three elements to the list, the list is [1, 4, 7], representing a complete tree where 1 is the root node and 4 and 7 are its left and right children, respectively.

Note that you can also use this library to insert tuples, where the first element of the tuple is the priority, and the second element is some associated data:

import heapq

heap = []
heapq.heappush(heap, (4, 'hello'))
heapq.heappush(heap, (1, 'world'))
heapq.heappush(heap, (7, 'foo'))
print(heap)  # Output: [(1, 'world'), (4, 'hello'), (7, 'foo')]

heappop()

The heappop() function removes and returns the smallest element from the heap. After removal, it rearranges the remaining elements to maintain the heap structure.

import heapq

# Construct a heap.
heap = []
heapq.heappush(heap, (4, 'hello'))
heapq.heappush(heap, (1, 'world'))
heapq.heappush(heap, (7, 'foo'))

print(heapq.heappop(heap))  # Output: (1, 'world')
print(heap)  # Output: [(4, 'hello'), (7, 'foo')]

When heappop() is called, the minimum element from the heap is removed, and the rest of the heap is maintained.

heapify()

The heapify() function transforms an arbitrary list into a heap. For example:

import heapq

heap = [(7, 'foo'), (1, 'world'), (4, 'hello')]
heapq.heapify(heap)
print(heap)  # Output: [(1, 'world'), (4, 'hello'), (7, 'foo')]

The heapify() function uses the same heap construction algorithm that we studied, which builds a heap in linear time.

Example Problem

Now, let's apply what we've learned to solve a problem. Suppose we have a list of integers and we want to find the k smallest elements. We can use a heap to efficiently find these elements.

import heapq

def k_smallest_elements(nums, k):
    smallest = []
    heapq.heapify(nums)
    for i in range(k):
        smallest.append(heapq.heappop(nums))
    return smallest

# Example usage:
nums = [5, 8, 2, 10, 3, 6]
k = 3
print(k_smallest_elements(nums, k))  # Output: [2, 3, 5]

The function heapifies the input list, and then calls heappop() k times to find the k smallest elements as it adds them to a list.

Reading

Read about heaps and priority queues in Chapter 9 of An Open Guide to Data Structures and Algorithms.

Practice

Here is a selection of practice problems from LeetCode involving heaps.

Kth Largest Element in an Array
- Note that for this problem, you will need to use a max-at-top heap.
Sort Characters By Frequency
Merge k Sorted Lists
- Note that for this problem, you will need to use a min-at-top heap.

In these problems, you might find it helpful to use either the heapq module, or the following Heap class:

    class Heap:
        def __init__(self, contents):
            self.contents = contents
            self.num_items = 0
            self.__make_heap()

        def __make_heap(self):
            if len(self.contents) == 0:
                return
            last = len(self.contents) - 1
            parent_of_last = (last - 1) // 2
            for i in range(parent_of_last, -1, -1):
                self.__sift_down(i)

        def insert(self, item):
            self.contents.append(item)
            self.__sift_up(self.num_items)
            self.num_items += 1

        def remove(self):
            if self.num_items == 0:
                return None

            to_remove = self.contents[0]
            self.contents[0] = self.contents[self.num_items - 1]
            self.contents.pop()
            self.num_items -= 1
            if self.num_items > 0:
                self.__sift_down(0)
            return to_remove

        def __sift_down(self, i):
            # Store a reference to the element being sifted.
            to_sift = self.contents[i]

            # Find where the sifted element belongs.
            parent = i
            child = 2 * parent + 1
            while child < self.num_items:
                # If the right child is bigger, compare with it.
                if child < self.num_items - 1 and self.contents[child] < self.contents[child + 1]:
                    child = child + 1

                # Check if we're done.
                if to_sift >= self.contents[child]:
                    break

                # If not, move child up and move down one level in the tree.
                self.contents[parent] = self.contents[child]
                parent = child
                child = 2 * parent + 1

            self.contents[parent] = to_sift

        def __sift_up(self, i):
            # Store a reference to the element being sifted.
            to_sift = self.contents[i]

            # Find where the sifted element belongs.
            child = i
            while child > 0:
                parent = (child - 1) // 2

                # Check if we're done.
                if to_sift <= self.contents[parent]:
                    break

                # If not, move parent down and move up one level in the tree.
                self.contents[child] = self.contents[parent]
                child = parent

            self.contents[child] = to_sift

You can copy and paste this Heap class directly into your code as an inner, nested class. You can then use it like:

h = Solution.Heap([])
h.insert(50)
h.insert(20)
h.insert(75)
print(h.remove())

Week 3 Quiz

Complete this week's quiz in Gradescope by navigating to the appropriate link.
Export it as a pdf using the Google Chrome plugin: https://gofullpage.com/. This plugin will do multiple screen captures while scrolling through your document, and then stitch them into a pdf that you can download.
Upload the generated pdf to Anchor below.

Dynamic Programming

Welcome to week 4!

Learning objectives

By the end of this week, you will be able to:

Recognize the characteristics of problems that can be solved using dynamic programming
Analyze the time and space complexity of dynamic programming solutions, and compare them to alternative approaches like brute force
Implement dynamic programming solutions for problems exhibiting optimal substructure and overlapping sub-problems

What's due this week

Week 4 Quiz (on Gradescope), due Thursday before class starts
Start working on Assignment 3 (due next week)

Rod Cutting Problem

The rod cutting problem is a classic optimization problem in computer science and mathematics. It involves cutting a rod of a certain length into smaller pieces to maximize the total revenue obtained from selling those pieces. This problem has various practical applications, such as in manufacturing, resource allocation, and finance.

Let's take a closer look at this problem to motivate our study of dynamic programming.

Problem Setup

In the rod cutting problem, we are given:

A rod of length $$n$$ cm
A table of prices $$p_i$$ for rod pieces of length $$i$$, where $$i$$ ranges from 1 to $$n$$

The goal is to determine the optimal way to cut the rod to maximize revenue. Each cut of the rod is free. Note that if the price $$p_n$$ of a rod of length $$n$$ is large enough, the optimal solution might be to not cut the rod at all.

Example

Let's consider the following table of prices for rod pieces of different lengths:

Length	Price
1	1
2	5
3	8
4	9
5	10
6	17
7	17
8	20

Say that we have a rod of length $$n = 4$$. The following image shows all of the possible ways that we could cut a rod of this length:

All of the eight possible ways of cutting a rod of length 4 units, along with the revenue generated for each.

The optimal solution with this price table is to cut the rod into two rods of length 2 cm, since that leads to a revenue of 10.

Next, let's think about how to write a program that solves this problem.

Solution Approach: Recursive Strategy

One approach to solve the rod cutting problem is through a recursive strategy. We start by considering all possible ways to cut the rod into smaller pieces and calculate the revenue for each combination. We then choose the combination that yields the maximum revenue.

Python Implementation

def max_revenue(length, prices):
    if length <= 0:
        return 0

    # Start max_val at a value that will be overwritten by any number.
    max_val = float('-inf')

    # Consider every possible combination to cut a rod of length n:
    for i in range(1, length + 1):
        revenue = prices[i] + max_revenue(length - i, prices)
        if revenue > max_val:
            max_val = revenue

    return max_val

# Example prices table
prices = {
    1: 1,
    2: 5,
    3: 8,
    4: 9,
    5: 10,
    6: 17,
    7: 17,
    8: 20
}

rod_length = 4
max_revenue = max_revenue(rod_length, prices)
print("Maximum revenue:", max_revenue)

This implementation defines a function max_revenue(length, prices) that takes the length of the rod and a dictionary of prices as input and returns the maximum revenue that can be obtained by cutting the rod. It recursively considers all possible ways to cut the rod and calculates the revenue for each combination, then returns the maximum revenue among them.

For example, say that we want to know the maximum revenue that we can generate with a rod of length 4. This is represented by max_revenue(4) (ignoring the prices parameter for simplicity). In the stack frame for max_revenue(4), we find the maximum among the following:

price(1) + max_revenue(3)
price(2) + max_revenue(2)
price(3) + max_revenue(1)
price(4) + max_revenue(0)

We apply the function recursively, and in the end return the maximum.

Time Complexity Analysis

The time complexity of this recursive approach is exponential, specifically $$O(2^n)$$, where $$n$$ is the length of the rod.

This exponential time complexity arises because the function explores all possible combinations of cuts, leading to a function call tree with $$2^n$$ nodes. Each node in the call tree represents a subproblem, and as the rod length increases, the number of subproblems grows exponentially.

For example, let's consider the call tree for a rod of length 4 cm:

The call tree for calculating the maximum revenue for a rod of length 4. Branches in a tree-like structure to recursively compute the maximum revenue for a rod of length 3, a rod of length 2, a rod of length 1, etc.

In this call tree:

At each level, the function makes a decision to cut the rod at a particular length, starting from the maximum length and recursively exploring all possible cuts.
The tree branches out exponentially as the function explores all possible combinations of cuts.

As we can see, the call tree grows exponentially with the length of the rod.

Rod length	Number of calls
0	1
1	2
2	4
3	8
4	16
...	...
n	2^n

For a rod of length $$n$$, the number of nodes in the call tree is exactly $$2^n$$. This exponential growth in the number of function calls leads to the overall time complexity of $$O(2^n)$$, which is an extremely expensive algorithm!

Next, let's look at a new technique to reduce the complexity of this algorithm: dynamic programming!

Dynamic Programming

In the previous lesson, we explored the rod cutting problem and implemented a solution using a recursive strategy. However, we observed that the recursive approach led to a call tree with $$2^n$$ nodes, which translates to an exponential time complexity, making it impractical for large rod lengths.

Let's take another look at that call tree:

Notice that when computing max_revenue(4), we recompute the same values over and over again. We make a call to compute max_revenue(2) twice, max_revenue(1) four times, and max_revenue(0) eight times!

To overcome the inefficiency of the recursive approach, we can use a technique that saves previously computed results and reuses them when needed. This technique, called dynamic programming, allows us to avoid redundant computations and significantly improve the efficiency of our solution.

Version with dynamic programming

Here's how we can modify our previous recursive solution to incorporate dynamic programming. We're going to ues a dictionary, saved, that maps rod lengths to the maximum revenue possible with that length:

def max_revenue(length, prices, saved):
    # If we've calculated the max revenue for this length before, use it.
    if length in saved:
        return saved[length]

    if length <= 0:
        return 0

    # Start max_val at a value that will be overwritten by any number.
    max_val = float('-inf')

    # Consider every possible combination to cut a rod of length n:
    for i in range(1, length + 1):
        revenue = prices[i] + max_revenue(length - i, prices, saved)
        if revenue > max_val:
            max_val = revenue

    # Save this computation.
    saved[length] = max_val

    return max_val

# Example prices table
prices = {
    1: 1,
    2: 5,
    3: 8,
    4: 9,
    5: 10,
    6: 17,
    7: 17,
    8: 20
}

rod_length = 4
saved = {}
max_revenue = max_revenue_saved(rod_length, prices, saved)
print("Maximum revenue:", max_revenue)

Time Complexity Analysis

With this approach, the time complexity significantly improves to $$O(n^2)$$, where $$n$$ is the length of the rod. This happens because for every cut size, we need to consider the cut sizes of all smaller subproblems. The improvement is achieved by avoiding redundant computations and reusing previously computed results.

However, it's worth noting that the space complexity of the solution increases, due to the storage of intermediate results in the saved dictionary. The space complexity becomes $$O(n)$$ as well. This is a classic example of a space-time tradeoff -- we're saving time by storing the intermediate results.

Check out the video below to see another explanation of how dynamic programming can be applied to the rod cutting problem:

Dynamic programming properties

Dynamic programming is a powerful technique used to solve problems by breaking them down into smaller subproblems and solving each subproblem only once. There are two key properties that are necessary for dynamic programming to be applicable, which are:

Optimal Substructure: The solution to a larger problem can be constructed from the solutions to its smaller subproblems.
Overlapping Subproblems: The same subproblems are encountered multiple times during the computation.

The rod cutting problem exhibits both optimal substructure and overlapping subproblems. The optimal way to cut a rod of length $$n$$ can be obtained by considering the optimal ways to cut its smaller subrods, and we encounter the same subrod lengths multiple times during the computation. Therefore, dynamic programming is well-suited for solving the rod cutting problem efficiently.

Fibonacci Top-Down Dynamic Programming

Let's now take a look at another problem where dynamic programming can be used: the Fibonacci sequence.

We discussed the Fibonacci sequence in the previous data structures course. We defined it to be a series where each number is the sum of the two preceding numbers, starting from $$0$$ and $$1$$. The sequence therefore starts like this:

$$0, 1, 1, 2, 3, 5, 8, 13, 21, ...$$

And continues infinitely. You can define the Fibonacci sequence recursively as:

Base cases:
- $$\text{fib}(0) = 0$$
- $$\text{fib}(1) = 1$$
Recursive definition:
- $$\text{fib}(n) = \text{fib}(n-1) + \text{fib}(n-2)$$, for $$n > 1$$

Let's now take a look back the recursive Python solution that we previously came up with to compute the $$n$$th Fibonacci number.

Naive Fibonacci implementation

We've seen before that we can translate the recursive definition of the Fibonacci sequence above into some Python code as follows:

def fibonacci(n):
    if n == 0:
        return 0
    if n == 1:
        return 1

    return fibonacci(n - 1) + fibonacci(n - 2)

This implementation is $$O(2^n)$$, since it requires an exponential number of function calls to compute. The analysis for why this is true is actually very similar to the analysis we did for the rod cutting problem. We end up recomputing the same value over and over again!

Suitability for dynamic programming

The calculation of the $$n$$th Fibonacci number exhibits the two characteristics that make it suitable for dynamic programming: optimal substructure and overlapping subproblems.

Remember that optimal substructure means that an optimal solution to a problem can be constructed from optimal solutions to its subproblems. In the case of the Fibonacci sequence, the optimal substructure property holds because the value of the $$n$$th Fibonacci number depends on the values of the $$(n-1)$$th and $$(n-2)$$th Fibonacci numbers.

It also exhibits the characteristic of overlapping subproblems, since there are some subproblems that are reused multiple times. For example, when computing fibonacci(5), we need fibonacci(4) and fibonacci(3), both of which require fibonacci(2).

Applying dynamic programming

Let's now try applying dynamic programming to the Fibonacci problem.

Start with the naive implementation of fibonacci() below, and add dynamic programming using a dictionary of pre-computed values. This dictionary, called saved, has been added as an optional parameter, so its initial value is the empty dictionary.

Add to the dictionary as necessary, and each time the function is called, first consult the dictionary to check whether the solution to that subproblem has already been computed.

When you think you've got it, check out the solution below:

Click to reveal the solution.

def fibonacci(n, saved={}):
    if n in saved:
        return saved[n]
    if n == 0:
        return 0
    if n == 1:
        return 1
saved[n] = fibonacci(n - 1) + fibonacci(n - 2)
return saved[n]

Memoization

This technique of storing the already computed solutions in temporary data structure is known as memoization (note: not the same as memorization!). Memoization is a technique used to improve the performance of recursive algorithms by storing the results of expensive function calls and returning the cached result when the same inputs occur again. Essentially, memoization ensures that each subproblem is solved only once, eliminating redundant computations.

Using memoization is known as a top-down approach to dynamic programming, since we start with the original problem (the "top"), recursively break it down into smaller subproblems, solve these subproblems one by one until we reach the base cases (the "bottom") and memoize the results, and finally combine the solutions of the subproblems to obtain the solution to the original problem.

Check out the video below to learn more about dynamic programming and memoization:

Aside from top-down dynamic programming, there's another approach known as bottom-up. Let's take a look at that next.

Fibonacci Bottom-Up Dynamic Programming

In the previous lesson, we explored memoization, a top-down approach to dynamic programming. We learned how memoization optimizes recursive algorithms by caching results and reusing solutions to overlapping subproblems. Now, let's delve into another approach: bottom-up dynamic programming.

Unlike memoization, which starts with the original problem and recursively breaks it down into smaller subproblems, bottom-up dynamic programming works iteratively from the bottom (i.e., the base cases) up to the original problem.

Bottom-up approach

When devising a bottom-up solution, we can use the following steps:

Initialization: Solve the base cases of the problem and store their solutions.
Iterative Solution: Iteratively solve increasingly larger subproblems, building up the solutions based on previously computed ones.
Optimal Solution: By the end of the iteration, we have computed the solution to the original problem.

Bottom-up dynamic programming is particularly useful when it's easy to determine the order in which subproblems should be solved and when all subproblems need to be solved.

Applying bottom-up to Fibonacci

Let's see how bottom-up dynamic programming can be applied to the problem of computing the $$n$$th Fibonacci number. Instead of recursively solving smaller subproblems, we'll iteratively compute Fibonacci numbers starting from the base cases.

With the steps above in mind, we'll proceed as follows:

Initialize a list to store the first two Fibonacci numbers: fib[0] = 0 and fib[1] = 1.
Iterate from 2 to n, computing each Fibonacci number as the sum of the two preceding ones: fib[i] = fib[i - 1] + fib[i - 2].
The value of fib[n] represents the $$n$$th Fibonacci number.

def fibonacci_bottom_up(n):
    fib = [0, 1] # the first two Fibonacci numbers
    for i in range(2, n + 1):
        next_number = fib[i - 1] + fib[i - 2]
        fib.append(next_number)
    return fib[n]

Check your understanding

Notice in the implementation of fibonacci_bottom_up() that we don't really need a list of length n -- at any point in time, we only care about the previous two Fibonacci numbers because we need them to compute the next Fibonacci number. Can you rewrite the function to improve the space complexity?

Click to reveal the answer.

Answer. We can accomplish this by only using two variables instead of a list. These variables represent the previous two Fibonacci numbers:

def fibonacci_bottom_up(n):
    if n <= 1:
        return n
    prev = 0
    curr = 1
    for _ in range(2, n + 1):
        temp = prev + curr
        prev = curr
        curr = temp
    return curr

Note that we need a variable temp in order to correctly advance the values (unless you use multiple assignment). Since we use only a constant number of variables, the space complexity is now O(1).

Choosing top-down vs. bottom-up

Now that we know there are two different approaches to dynamic programming, you might be wondering which one you should choose when faced with a relevant problem.

In general, the two approaches are interchangeable, but here are some specific advantages of each:

Top-down (memoization): Intuitive and easy to implement, especially when translating an existing recursive solution to a memoized solution. Efficiently handles problems with a large state space by storing computed results in a cache. Preferred when the problem naturally lends itself to recursion and when only a subset of the state space needs to be explored.
Bottom-up: Typically more efficient in terms of space complexity, especially for problems with a straightforward order of computation. Avoids the overhead of function calls and recursion. Preferred when all subproblems need to be solved and when the order of computation is well-defined.

Next, let's take a look at one more dynamic programming problems: computing the longest common subsequence.

Longest Common Subsequence

Another classic dynamic programming problem is finding the longest common subsequence between two strings. A subsequence of characters doesn't necessarily have to be in consecutive positions -- that would be substrings. For example, given these two strings:

$$S_1 = AABCDD$$
$$S_2 = ABBBD$$

Then the longest common substring between $$S_1$$ and $$S_2$$ is just $$AB$$, but the longest common subsequence between them is $$ABD$$, since that sequence of characters appears in both strings.

Applications

Naive Approach

Like most dynamic programming problems, it's helpful to think of a naive, recursive, and expensive solution first, and then improve on the solution once we have an initial understanding.

Reasoning

Let's take a look at an example instance of the problem: let's try to compute the longest common subsequence between $$AABCDD$$ and $$ABBBD)$$.

First, we notice that both strings start with $$A$$. Since the two strings match in this position, that $$A$$ must be a part of the LCS. Therefore, we can say that $$LCS(AABCDD, ABBBD) = A + LCS(ABCDD, BBBD)$$ -- breaking the problem down into a subproblem by removing the initial $$A$$ from both strings. Note that this means the problem exhibits the optimal subproblem property!

What about when the first characters of the strings don't match? For example, let's take a look at that next subproblem: $$LCS(ABCDD, BBBD)$$. Since $$A$$ and $$B$$ don't match, we have two options for the next subproblem to consider:

Take the first character off of $$S_1$$ and see what the LCS is: $$LCS(BCDD, BBBD)$$
Take the first character off of $$S_2$$ and see what the LCS is: $$LCS(ABCDD, BBD)$$.

Once we've figured out both options, $$LCS(ABCDD, BBBD)$$ is equal to whichever is longer.

Python Implementation

Here's a Python program that implements the naive, recursive approach:

def lcs(s1, s2):
    if len(s1) == 0 or len(s2) == 0:
        # At least one of the strings is empty, so there's no
        # characters in the longest common subsequence.
        return ''
    if s1[0] == s2[0]:
        return s1[0] + lcs(s1[1:], s2[1:])
    else:
        # Pick whichever subproblem ultimately leads to a
        # longer LCS.
        r1 = lcs(s1, s2[1:])
        r2 = lcs(s1[1:], s2))
        if len(r1) > len(r2):
            return r1
        else:
            return r2

Efficiency

This algorithm is quite inefficient, since it has to be called for all possible substrings of both s1 and s2. Additionally, many of these function calls are redundant. For example:

lcs('cba', 'dea') calls lcs('cba', 'ea') and lcs('ba', 'dea')
    lcs('cba, 'ea') calls lcs('cba', 'a') and lcs('ba', ea')
        ...
    lcs('ba', 'dea') calls lcs('ba', 'ea') and lcs('a', 'dea')
        ...

As you can see, lcs('ba', 'ea') is called multiple times, in different parts of the call tree. This means that the LCS problem also exhibits overlapping subproblems -- another characteristic of dynamic programming problems.

Since it reconsiders all possible combinations of strings, the efficiency is $$O(2^n * 2^m)$$, where $$n$$ is the length of $$S_1$$ and $$m$$ is the length of $$S_2$$.

Bottom-Up Approach

Above, we noticed that this problem exhibits both required characteristics for a dynamic programming solution. Let's see first how this can be achieved in a bottom-up approach:

To summarize, the video above uses a bottom-up (tabulation) approach, where they solve the smallest subproblems first, and then use the solutions to those subproblems to build up a solution to the overall problem. As is illustrated by the usage of the 2D table, both the time and space complexity for this approach are $$O(mn)$$.

Memoized Approach

We could also solve the LCS problem using top-down dynamic programming with memoization. This would require making $$O(mn)$$ recursive calls as well, since we need to consider all $$n$$ possible substrings for $$S_1$$ against all $$m$$ possible substrings of $$S_2$$. However, we never need to recompute the values after they are found the first time, since they can be stored in a table.

Reading

Read about dynamic programming in Chapter 10 of An Open Guide to Data Structures and Algorithms.

Practice

Here is a selection of practice problems from LeetCode involving dynamic programming:

Week 4 Quiz

Complete this week's quiz in Gradescope by navigating to the appropriate link.
Export it as a pdf using the Google Chrome plugin: https://gofullpage.com/. This plugin will do multiple screen captures while scrolling through your document, and then stitch them into a pdf that you can download.
Upload the generated pdf to Anchor below.

Assignment 3

Due date: May 12

File Analyzer

In this assignment, you will complete a series of exercises that test your knowledge of how to use heaps and implement dynamic programming algorithms.

▶️ Access the assignment on GitHub Classroom

Remember...

Read the instructions carefully
Plan before you code
Debug if you aren't getting the desired output
Attend office hours if you need additional support
Ask for help in Discord

Week 5

Welcome to week 5! In this week, we take a break from learning new material to focus on the midterm exam.

What's due this week

Midterm exam on Thursday
Assignment 3, due Sunday

Midterm Exam

We will have our midterm on Thursday, May 9 at 6:00 PM GMT. Here are the details:

Link to Exam

You can access the exam on OnlineExamMaker here: Midterm exam.

Date and time

Held during Week 6 during our normal class time: Thursday, May 9 at 6:00 PM GMT
You can access the link to the midterm ahead of time, but it won't be enabled until 6:00 PM GMT the day of the exam

Material

Everything that was covered in weeks 1 - 4 of the class
This includes lesson material, live classes, practice problems, assignments

Format

Administered online using a tool called OnlineExamMaker
You can prepare and use any amount of notes on paper, but no other resources on the computer or Internet are allowed
60 minute exam from the time you begin
Be sure to start the test between 6:00 - 6:15 PM GMT (it will close after that point)
Exam will be composed of 20 multiple choice questions, shown one at a time
You can't go back and change your answer after moving on to the next question
We won't be using Zoom during the exam; you can just access the exam individually when you're ready
Plan to have scratch paper for tracing/diagramming etc.

Proctoring software

The exam website is also a proctoring service
It will ask you to turn your camera on, and will take some pictures at random during the test
It will flag any activity such as closing the test, changing programs, switching tabs, etc. If you do this too many times, it will close the exam and you will not be able to continue. Once you start the exam, you should expect to stay on the exam page and complete each question without any interruptions or using your computer for anything else.
It disables copy and paste during the test
It gives us the option to monitor your activity

Exam environment

Plan to be in a test-taking environment
Quiet, mostly private

Practice exam

A practice exam is available here so that you can get used to the software and format.

Questions

If you have any questions leading up to the exam, please ask in #help-dsa2 on Discord or reach out to the instructor.

Graphs I

Welcome to week 6!

Learning objectives

By the end of this week, you will be able to:

Interpret concepts such as nodes, edges, degrees, paths, and cycles in the contexts of graph-based problems
Compare and contrast various graph representations in terms of memory usage and efficiency for different graph operations
Recognize and implement appropriate algorithms to solve graph-based problems

What's due this week

Week 6 Quiz (on Gradescope), due Thursday before class starts

Basics

Let's now turn our attention to yet another fundamental data structure: graphs. Just like other data structures, such as arrays, linked lists, and trees, graphs are a building block on which we can build other abstractions to solve complex problems.

We have discussed graphs to some degree in previous courses, so let's begin with a review of graphs and the associated terminology. To begin with, here's an example of a graph:

Each graph, including the one above, is defined in terms of a set of vertices $$V$$ (also known as *nodes) and a set of edges $$E$$. Each edge connects a pair of vertices.

Graphs therefore give us the ability to represent entities and the relationships between them. For example, social networks are most naturally represented as graphs.

Here's another example: a graph that shows the connectedness of some cities:

Some cities represented as a graph, where the vertices are cities and the edges are the distances of the roads that connect them.

In this case, the vertices represent cities, and the edges represent the lengths of the roads between these cities. This is actually an example of a weighted graph, since there is a cost associated with each edge. Eventually, we'll be able to represent problems as graphs, and then ask questions like: what's the shortest route from Abuja to Port Harcourt?

Watch the following videos to learn about the basics of graphs:

Terminology

As you can see from the videos, there is a lot of terminology associated with graphs. Let's revisit some of the more important concepts below with examples:

Adjacent

Two vertices are adjacent if they are connected by a single edge.

For example, in the graph below, nodes $$d$$ and $$f$$ are adjacent, but $$f$$ and $$j$$ are not adjacent.

Neighbors

The collection of vertices that are adjacent to a vertex $$v$$ are known as the neighbors of $$v$$. For example, the neighbors of $$d$$ are $$c$$, $$f$$, and $$e$$.

Path

A path in a graph is a sequence of edges that connects two vertices. For example, here's a graph with a highlighted path:

Connected

A graph is connected if there is a path between any two pairs of vertices. The graph above is connected, but the following six vertices (all part of the same graph) are not connected:

A graph composed of two groups of vertices, with three vertices in each group.

Sometimes, we call the different parts of an unconnected graph islands, as in, the above graph has two islands of vertices.

Complete

A graph is complete if there is an edge between every pair of vertices. Here's an example:

A graph composed of five vertices, where each vertex is connected to every other vertex.

Directed

A directed graph has a direction associated with each edge, which is depicted using an arrow:

Edges in a directed graph are often represented as ordered pairs of the form (start vertex, end vertex). For example, $$(a, b)$$ is an edge in the graph above, but $$(b, a)$$ is not.

Graphs which have no direction on edges are known as undirected graphs.

Cycles

A cycle is a path that leaves a given vertex using one edge, and then later returns to that same vertex using a different edge. For example:

Now that we know a bit about graphs and the terminology used to describe them, we can turn our attention to ways to represent graphs in programs.

Representations

Now that we understand the definition of graphs, we can turn our attention to thinking about how to represent them in computers.

Think back to our time designing data structures such as stacks, queues, and heaps. In those cases, we had choices to make about how to represent them in memory. In particular, we were able to choose between array-based data structures and linked data structures. For example, in a previous course we saw how to represent stacks and queues with both arrays and linked lists, and in this course saw that we could greatly simplify the representation of a heap (a complete tree) by using an array.

For graphs, there are typically three ways that they can be represented in a program's memory:

Edge list
Adjacency matrix
Adjacency list

Let's now talk about each of these representations.

Edge list

Watch the following video to see how to represent a graph using a vertex list and edge list:

To summarize:

Graphs can be represented as separate lists: one for the vertices $$V$$, and one for the edges $$E$$
Algorithms like finding all vertices that are adjacent to a given vertex, or checking whether two vertices are adjacent, require time $$O(|E|)$$, which in the worst case can be $$O(|V|^2)$$, which is not very time efficient

Adjacency matrix

Another option is to represent the graph using an adjancency matrix:

To summarize:

Graphs can be represented using a list of vertices and a two-dimensional matrix that represents the edges between those vertices
When represented this way, finding the adjacent vertices of a given vertex only requires $$O(|V|)$$ time, and checking whether two vertices are adjacent is even better: $$O(1)$$ time!
The main disadvantage is that we now require $$O(|V|^2)$$ space to store the edge representation

Let's take a closer look at our cities graph from the previous lesson. If we chose to store it as an adjacency matrix, here's what it would look like:

Representing the cities graph using an adjacency matrix.

Adjacency list

The final representation of graphs we will consider is that of an adjacency list:

To summarize:

For every vertex, we can keep track of the vertices with which it is neighbors using an adjacency list
This makes (1) finding the adjacent vertices of a given vertex and (2) checking whether two vertices are adjacent run in time $$O(|V|)$$, but typically performs much better than the edge list representation, since we don't need to consider unrelated vertices

Bringing back our cities example, here is our final representation using an adjacency list. Note that two of the vertices and the edge connecting them are highlighted in the diagram:

Representing the cities graph using an adjacency list.

Traversals

In a previous course, we saw that we can traverse trees in a variety of ways, including depth-first traversals (including pre-order, post-order, and in-order traversals), and breadth-first traversals (which is the same as a level-order traversal).

With graphs, we need similar traversal algorithms. Given a starting point, we need to define ways of traversing the graph to visit all of the nodes -- at least, the ones reachable from the starting point.

Let's talk about the two main traversal techniques: depth-first and breadth-first traversals.

Depth-first traversal

Watch the following video to see how a depth-first traversal works on graphs, and to see what kinds of algorithms you can write with a depth-first traversal:

To summarize:

A depth-first traversal proceeds as far as possible along a given path before backing up
A DF traversal is most naturally implemented recursively
Since graphs can have cycles (unlike trees), we need to mark nodes as "visited" in order to know when the recursion down a particular path should stop

Check your understanding

Consider the following graph:

If a DF traversal was run on this graph starting from Lagos, what would be the order in which cities are visited? Assume that if there are multiple options for the next neighbor (city) to visit, the nearest neighbor is chosen.

Click to reveal the answer.

Answer. The DF traversal starts from Lagos and marks it as visited. It then goes to Ibadan (since it is the closest neighbor to Lagos) and marks it as visited. From Ibadan, the only unvisited neighbor is Abuja, so we visit Abuja. From Abuja, the closest neighbor is Kaduna, so we mark it as visited and then also visit Kano, since it is the only unvisited neighbor of Kaduna.

At Kano, we have no unvisited neighbord, so we backtrack to Kaduna. At Kaduna, we also have no unvisited neighbors, so we backtrack to Abuja. At Abuja, we the next closest unvisited neighbor is Onitsha, so we visit it and then visit Aba. From Aba, we visit Port Harcourt. At Port Harcourt, the only unvisited neighbor is Benin City. Once we visit Benin City, it has no unvisited neighbors, so we return back to Port Harcourt.

We then similarly return back through Aba, Onitsha, Abuja, and Ibadan. Back at Lagos, there are also no remaining unvisited neighbors, so the DF traversal is complete.

The final order is therefore: Lagos, Ibadan, Abuja, Kaduna, Kano, Onitsha, Aba, Port Harcourt, Benin City.

For your reference, here's some pseudocode for a depth-first traveral (assuming an adjacency list graph):

def df_trav(v, parent):
    # Visit v.
    print(v.id)

    # Mark this vertex as visited.
    v.done = True

    # Mark this vertex's parent in the DF traversal
    v.parent = parent

    # Iterate over the vertex's adjacency list.
    e = v.edges
    while e is not None:
        w = e.end # Consider a neighbor w
        if not w.done:
            df_trav(w, v)
        e = e.next

Breadth-first traversal

Watch the following video to see how a breadth-first traversal works on graphs, and to see what kinds of algorithms you can write with a breadth-first traverasl:

To summarize:

A breadth-first traversal uses the following algorithm:
1. visit a vertex
2. visit all of its neighbors
3. visit all unvisited vertices 2 edges away
4. visit all unvisited vertices 3 edges away, etc
In order to visit one level at a time, the BF traversal algorithm uses an auxiliary queue
Nodes still need to be marked as visited during the algorithm, since nodes can be seen in multiple "levels" of the traversal (since there can be cycles)

Check your understanding

Consider the following graph:

If a BF traversal was run on this graph starting from Lagos, what would be the order in which cities are visited? Assume that at each city, the unvisited neighboring cities are added to the queue in order by smallest distance first.

Click to reveal the answer.

Answer. We start from Lagos and Ibadan and Benin City to the queue (in that order). We dequeue Ibadan and visit it. There, the only unvisited neighbor is Abuja, so we enqueue that. The next city dequeued is Benin City, so we visit it and add Port Harcourt to the queue. So far, we've visited all of the cities a distance of 0 or 1 edges away from Lagos.

The next city in the queue is Abuja. We visit it and Kaduna and Onitsha to the queue (in that order). We next dequeue Port Harcourt, and add Aba to the queue. The queue is therefore: Kaduna, Onitsha, Aba at this stage, and at this point, we have visited all cities a distance of 2 or fewer away from lagos.

We dequeue Kaduna and visit it, and add Kano to the queue. We next dequeue Onitsha, but it has no unvisited neighbors. We then dequeue Aba, but it also has no unvisited neighbors. We have now visited all neighbors a distance of less than or equal to three edges away from Lagos.

The only remaining item in the queue is Kano, so we dequeue it and since the queue is empty, the BF traversal is done. The final order is therefore: Lagos, Ibadan, Benin City, Abuja, Port Harcourt, Kaduna, Onitsha, Aba, Kano.

For your reference, here's some pseudocode to implement the BF traversal (assuming an adjacency list implementation):

def bf_trav(origin):
    origin.encountered = true
    origin.parent = null

    q = Queue()
    q.insert(origin)
    while len(q) > 0:
        # Get next vertex in the traversal.
        v = q.remove()

        # Visit v.
        print(v.id)

        # Add v's unencountered neighbors to the queue.
        e = v.edges
        while e is not None:
            w = e.end
            if not w.encountered:
                w.encoutered = True
                w.parent = v
                q.insert(w)
            e = e.next

Minimum Spanning Trees

In the previous lesson, we performed depth-first and breadth-first traversals of the city graph starting from Lagos. For each traversal, we were looking to find the order in which the cities were visited.

However, instead of the sequence of cities, we could have asked for a different output: the spanning tree produced by each traversal. A spanning tree is a subset of a connected graph that contains all of the vertices and a subset of the edges, which form a tree. For example, here is the spanning tree produced by the depth-first traversal:

The spanning tree that results from the depth-first traversal starting from Lagos.

Note that the spanning tree is formed by connecting each visited child node with its parent node, which is the node that caused the child to be visited via a recursive call.

And here is the spanning tree produced by the breadth-first traversal:

The spanning tree that results from the breadth-first traversal starting from Lagos.

Note that the spanning tree is formed by connecting each visited child node with its parent node, which is the node that added the child to the queue.

Minimum spanning trees

Although both the DF traversal and BF traversal produce spanning trees, neither of the spanning trees produced by the algorithm in the example above are minimum spanning trees. A minimum spanning tree (MST) has the smallest total cost among all possible spanning trees.

For example, the spanning tree produced by the DF traversal has a total edge weight of 2250, while the BF traversal spanning tree is 2460. However, the MST has a total edge cost of 2050 (it doesn't use the edge between Ibadan and Abuja):

The minimum spanning tree of the city graph.

Note that if all edges have unique costs, there is only one MST. If some edges have the same cost, there may be more than one MST.

Computing an MST could be useful for various applications, including determining the shortest highway system for a set of cities, or calculating the smallest length of cable needed to connect a network of computers.

So how can we compute the MST of a graph? Let's next look at one option: Prim's algorithm.

Prim's Algorithm

There are multiple different ways of finding the minimum spanning tree of a graph. For example, a naive algorithm would just enumerate every possible spanning tree that can be formed from the graph, and would keep the one with the lowest total weight. However, in the worst case, for a connected graph there are $$O(n^n)$$ possible spanning trees, and so this is not a practical algorithm.

One popular alternative for efficiently computing the MST of a graph is Prim's algorithm.. Let's see how it works.

Algorithm

The algorithm is as follows:

Begin with the following two subsets:
- $$A$$ = any one of the vertices
- $$B$$ = all of the other vertices
Repeatedly do the following:
- Select the lowest-cost edge $$(v_a, v_b)$$ connecting a vertex in $$A$$ to a vertex in $$B$$
- Add $$(v_a, v_b)$$ to the spanning tree
- Move vertex $$v_b$$ from set $$B$$ to set $$A$$
Continue until set $$A$$ contains all of the vertices.

Efficiency

Let's think about the efficiency of Prim's algorithm, both for an adjacency matrix and an adjacency list.

When using an adjacency matrix representation, Prim's algorithm has a time complexity of $$O(|V|^2)$$, where $$V$$ represents the vertices in the graph. This is because in each iteration of the algorithm, it scans through the entire adjacency matrix to find the minimum-weight edge connecting a vertex from the MST to a vertex outside the MST. Since there are $$|V|$$ vertices and for each vertex, we might have to consider all other vertices to find the minimum edge weight, the time complexity becomes $$O(|V|^2)$$.

For an adjacency list, in the worst case, you need to scan through all $$|E|$$ edges for every one of the $$|V|$$ nodes. Therefore, the running time is $$O(|V|*|E|)$.

Example

Recall our city graph from the previous lessons:

Check your understanding

Run Prim's algorithm on the graph above, starting with Lagos as the node in the first set.

Click to reveal the answer.

Answer. Starting with the set A being { Lagos },, we have a choice between Ibadan and Benin City. Ibadan is closer, so we add the edge (Lagos, Ibadan) to the MST, and also add Ibadan to the set of cities already considered.

The next smallest edge between { Lagos, Ibadan } and any other city is the length 400 edge between Lagos and Benin City, so we add that.

Next, the smallest edge between the elements in the set and the elements outside of the set is the edge between Benin City and Port Harcourt, so that gets added. The next smallest edge would be the edge between Port Harcourt and Aba, and so that would be added as well.

At this stage, the list of vertexes already considered is { Lagos, Ibadan, Benin City, Port Harcourt, Aba }. The next smallest edge is the one connecting Aba to Onitsha, so we add it, along with the following: (Onitsha, Abuja); (Abuja, Kaduna); (Kaduna, Kano).

Summary

Watch the video below to see a summary of Prim's algorithm:

Topological Sort

Another interesting algorithm that comes up in many applications is topological sort. Topological sort is an algorithm that works specifically on directed acyclic graphs (DAGs), and orders the vertices such that if there is directed edge from $$a$$ to $$b$$, then $$a$$ comes before $$b$$ in the output of the sort.

DAGs

Directed acyclic graphs (DAGs) are a fundamental data structure, often used to represent relationships between objects. As the name suggests, DAGs consist of nodes connected by directed edges, and since they are acyclic, they must not contain any cycles.

The importance of DAGs lies in their applicability to a wide range of problems in various fields. In computer science, DAGs are commonly used in algorithms for tasks such as scheduling and dependency resolution. For example, in software development, DAGs can represent the dependencies between different modules or components of a program, allowing for efficient compilation or build processes.

Beyond computer science, DAGs find applications in domains including biology, finance, and logistics. In finance, they can represent financial transactions and dependencies between assets in a portfolio. In logistics, DAGs can help optimize supply chain management by representing the flow of goods between different locations. Overall, DAGs serve as powerful tools for representing and analyzing complex relationships and dependencies in various domains.

Algorithm

Watch the video below to see how topological sort works:

Strongly Connected Components

Now that we know a bit about graphs and traversals, we can continue our exploration by looking at other problems related to graphs. Check out the following video to see the idea of strongly connected components and how to find them:

Reading

Read about graphs in Chapter 11 (all sections except for "Single Source Shortest Path") of An Open Guide to Data Structures and Algorithms.

Practice

Here is a selection of practice problems from LeetCode involving graphs:

Week 6 Quiz

Complete this week's quiz in Gradescope by navigating to the appropriate link.
Export it as a pdf using the Google Chrome plugin: https://gofullpage.com/. This plugin will do multiple screen captures while scrolling through your document, and then stitch them into a pdf that you can download.
Upload the generated pdf to Anchor below.

Graphs II

Welcome to week 7!

Learning objectives

By the end of this week, you will be able to:

Read and write algorithms using adjacency-list graph implementations
Trace and apply algorithms related to shortest path problems
Recognize bipartite graphs and apply them to problems such as the stable marriage problem

What's due this week

Week 7 Quiz (on Gradescope), due Thursday before class starts
Assignment 4, due Sunday

Graph Implementation

Let's now look more closely at a Python implementation of a graph that we'll be using throughout this week. The implementation uses an adjacency list approach, where we have:

A linked list of Vertex objects representing the vertices of the graph
Each Vertex object contains a linked list of Edge objects, representing the edges that are connected to the given Vertex
Each Edge object contains a reference to the two Vertex objects that it connects, along with a cost (weight)

Here's a visual representation of our forthcoming Graph class:

The in-memory representation of a graph that uses a linked list (vertices) of linked lists (edges).

Try to match up the vertices and edges in the graph on the right with the in-memory representation on the left.

Here's the core definitions of the Graph, Vertex, and Edge objects:

class Graph:
    def __init__(self):
        self.vertices = None    # linked list of vertices in the graph

    class Vertex:
        def __init__(self, id):
            self.id = id        # vertex ID
            self.edges = None   # linked list of edges for this vertex
            self.next = None    # next vertex in the linked list of vertices
        ...

    class Edge:
        def __init__(self, start, end, cost):
            self.start = start  # starting vertex of the edge
            self.end = end      # ending vertex of the edge
            self.cost = cost    # weight of the edge
            self.next = None    # next edge in the linked list of edges
        ...

Here's the full implementation: graph.py

Here's a sample graph: graph.txt

When you run graph.py, you will be prompted to enter both (1) the name of the input file (graph.txt in this case) and (2) the starting point for the algorithms:

$ python graph.py
Graph info file: graph.txt
starting point: Lagos

depth-first traversal from Lagos:
...

Note that there are a methods provided in graph.py:

get_vertex(), add_vertex(), and add_edge() for fetching and building the graph
init_from_file(), which reads in a graph configuration from a file and creates a Graph object with the vertices and edges that are listed in the file (example file)
Various graph algorithms: breadth_first_trav(), depth_first_trav(), and prim()
Since the various graph algorithms may mark the graph (such as to mark vertices as encountered or done), there is a reinit_vertices() method that resets the state of each vertex so that the next algorithm can have a clean slate

Dijkstra's Algorithm

Let's now turn to a new problem that graphs can help us solve: the shortest path problem. It's often useful to know the shortest path from one vertex to another – i.e., the one with the minimal total cost. This can help in applications such as routing traffic in the Internet.

For an unweighted graph, we can simply do the following:

Start a breadth-first traversal from the origin, $$v$$
Stop the traversal when you reach the other vertex, $$w$$
The path from $$v$$ to $$w$$ in the resulting (possibly partial) spanning tree is a shortest path

A breadth-first traversal works for an unweighted graph because the shortest path is simply the one with the fewest edges, and a breadth-first traversal visits cities in order according to the number of edges they are from the origin.

For weighted graphs, there are multiple algorithms that find the shortest paths. One very famous example is Dijkstra's algorithm.

Dijkstra's algorithm

Dijkstra's algorithm allows us to find the shortest path from a vertex $$v$$ (the origin) to all other vertices that can be reached from $$v$$.

The basic idea is to maintain estimates of the shortest paths from the origin to every vertex (along with their costs), and gradually refine these estimates as we traverse the graph

Watch the video below to see how Dijkstra's algorithm works:

Implementation

We can now add Dijkstra's algorithm to our graph.py file:

    def dijkstra(self, origin_id):
        self.reinit_vertices()
        origin = self.get_vertex(origin_id)
        if origin is None:
            raise ValueError(f"No such vertex: {origin_id}")
        origin.cost = 0

        while True:
            w = None
            v = self.vertices
            while v:
                if not v.done and (w is None or v.cost < w.cost):
                    w = v
                v = v.next

            if w is None or w.cost == float('inf'):
                return

            print(f"\tfinalizing {w.id} (cost = {w.cost}" +
                  (f", parent = {w.parent.id})" if w.parent else ")"))
            print(f"\t\tpath = {w.path_string()}")
            w.done = True

            edge = w.edges
            while edge:
                x = edge.end
                if not x.done:
                    cost_via_w = w.cost + edge.cost
                    if cost_via_w < x.cost:
                        x.cost = cost_via_w
                        x.parent = w
                edge = edge.next

Try adding this code to your graph.py file and running it with an example graph.

Example

Recall the graph that we've been using that represents some cities and roads in a map (note: some of the edge weights have been slightly changed from previous versions so that the output of the algorithm is more interesting):

Some cities represented as a graph, where the vertices are cities and the edges are the distances of the roads that connect them.

Check your understanding

Run Dijkstra's algorithm on the above graph starting from Abuja. Your answer should be a table where each row contains the name of a city and the shortest path distance from Abuja to that city according to the graph.

Click to reveal the answer.

Answer. Here's the algorithm, step-by-step, from Abuja:

Iteration 0 (Initial State)

City	Iteration 0
Abuja	0
Kaduna	∞
Kano	∞
Ibadan	∞
BeninCity	∞
Lagos	∞
PortHarcourt	∞
Onitsha	∞
Aba	∞

We finalize Abuja (distance 0) and see if we can improve estimates of its neighbors:

Iteration 1

City	Iteration 0	Iteration 1
Abuja	0	0
Kaduna	∞	310
Kano	∞	∞
Ibadan	∞	600
BeninCity	∞	1000
Lagos	∞	∞
PortHarcourt	∞	∞
Onitsha	∞	430
Aba	∞	∞

We finalize Kaduna (distance 310) and see if we can improve estimates of its neighbors:

Iteration 2

City	Iteration 0	Iteration 1	Iteration 2
Abuja	0	0	0
Kaduna	∞	310	310
Kano	∞	530	530
Ibadan	∞	600	600
BeninCity	∞	1000	1000
Lagos	∞	∞	∞
PortHarcourt	∞	∞	∞
Onitsha	∞	430	430
Aba	∞	∞	∞

We finalize Onitsha (distance 430) and see if we can improve estimates of its neighbors:

Iteration 3

City	Iteration 0	Iteration 1	Iteration 2	Iteration 3
Abuja	0	0	0	0
Kaduna	∞	310	310	310
Kano	∞	530	530	530
Ibadan	∞	600	600	600
BeninCity	∞	1000	1000	1000
Lagos	∞	∞	∞	∞
PortHarcourt	∞	∞	∞	690
Onitsha	∞	430	430	430
Aba	∞	∞	∞	620

We finalize Kano (distance 530) and see if we can improve estimates of its neighbors (it has no unfinalized neighbors, so we can't improve at all):

Iteration 4

City	Iteration 0	Iteration 1	Iteration 2	Iteration 3	Iteration 4
Abuja	0	0	0	0	0
Kaduna	∞	310	310	310	310
Kano	∞	530	530	530	530
Ibadan	∞	600	600	600	600
BeninCity	∞	1000	1000	1000	1000
Lagos	∞	∞	∞	∞	∞
PortHarcourt	∞	∞	∞	690	690
Onitsha	∞	430	430	430	430
Aba	∞	∞	∞	620	620

We finalize Ibadan (distance 600) and see if we can improve estimates of its neighbors:

Iteration 5

City	Iteration 0	Iteration 1	Iteration 2	Iteration 3	Iteration 4	Iteration 5
Abuja	0	0	0	0	0	0
Kaduna	∞	310	310	310	310	310
Kano	∞	530	530	530	530	530
Ibadan	∞	600	600	600	600	600
BeninCity	∞	1000	1000	1000	1000	1000
Lagos	∞	∞	∞	∞	∞	740
PortHarcourt	∞	∞	∞	690	690	690
Onitsha	∞	430	430	430	430	430
Aba	∞	∞	∞	620	620	620

We finalize Aba (distance 620) and see if we can improve estimates of its neighbors:

Iteration 6

City	Iteration 0	Iteration 1	Iteration 2	Iteration 3	Iteration 4	Iteration 5	Iteration 6
Abuja	0	0	0	0	0	0	0
Kaduna	∞	310	310	310	310	310	310
Kano	∞	530	530	530	530	530	530
Ibadan	∞	600	600	600	600	600	600
BeninCity	∞	1000	1000	1000	1000	1000	1000
Lagos	∞	∞	∞	∞	∞	740	740
PortHarcourt	∞	∞	∞	690	690	690	680
Onitsha	∞	430	430	430	430	430	430
Aba	∞	∞	∞	620	620	620	620

We finalize Port Harcourt (distance 680) and see if we can improve estimates of its neighbors:

Iteration 7

City	Iteration 0	Iteration 1	Iteration 2	Iteration 3	Iteration 4	Iteration 5	Iteration 6	Iteration 7
Abuja	0	0	0	0	0	0	0	0
Kaduna	∞	310	310	310	310	310	310	310
Kano	∞	530	530	530	530	530	530	530
Ibadan	∞	600	600	600	600	600	600	600
BeninCity	∞	1000	1000	1000	1000	1000	1000	980
Lagos	∞	∞	∞	∞	∞	740	740	740
PortHarcourt	∞	∞	∞	690	690	690	680	680
Onitsha	∞	430	430	430	430	430	430	430
Aba	∞	∞	∞	620	620	620	620	620

We finalize Lagos (distance 740) and see if we can improve estimates of its neighbors (but we can't, since the route to Benin City through Lagos is longer than the route through Port Harcourt):

Iteration 8

City	Iteration 0	Iteration 1	Iteration 2	Iteration 3	Iteration 4	Iteration 5	Iteration 6	Iteration 7	Iteration 8
Abuja	0	0	0	0	0	0	0	0	0
Kaduna	∞	310	310	310	310	310	310	310	310
Kano	∞	530	530	530	530	530	530	530	530
Ibadan	∞	600	600	600	600	600	600	600	600
BeninCity	∞	1000	1000	1000	1000	1000	1000	980	980
Lagos	∞	∞	∞	∞	∞	740	740	740	740
PortHarcourt	∞	∞	∞	690	690	690	680	680	680
Onitsha	∞	430	430	430	430	430	430	430	430
Aba	∞	∞	∞	620	620	620	620	620	620

We can then finalize the estimate to Benin City (distance 980) as it is the last unfinalized vertex.

Time analysis

The time complexity of Dijkstra's algorithm depends in part on what technique is used to select the vertex with the next-smallest estimate in order to finalize it. For example, for every iteration of the algorithm, we could scan over the entire list of vertices, but a faster way would be to store the vertices in a min-at-top priority queue, where the priority is the vertex's current shortest path estimate. This way, we could repeatedly remove the next-smallest estimate vertex and finalize it.

When using a priority queue, each operation to the extract the minimum takes $$O(log|V|)$$ time, and there are $$|V|$$ such operations. Over the course of the algorithm, vertexes are decreased according to the $$|E|$$ edges, and each decrease operation requires $$O(log|V|)$$ time, since you must search through the heap for the vertex whose estimate you want to decrease. Therefore, the running time is $$O(|E|*log|V| + |V|*log|V|) = O((|E| + |V|)log|V|)$$. In most cases, $$|E| > |V|$$, so often the running time is written as $$O(|E|log|V|)$$.

Summary

Watch the following video to see a quick summary of Dijkstra's algorithm:

Bellman-Ford Algorithm

In addition to Dijkstra's algorithm, there is another common algorithm for solving the single-source shortest paths problem: the Bellman-Ford algorithm.

The Bellman-Ford algorithm is actually slower than Dijkstra's algorithm, but it has one notable advantage: it is capable of solving the shortest paths problem even for graphs that have some edges with negative weights.

Negative edge weights

Negative edge weights are exactly what they sound like: weights assigned to edges in a graph that have a negative value. While they might seem counterintuitive at first, negative edge weights serve various purposes and can model a wide range of real-world scenarios. These scenarios include:

Time savings: In certain scenarios, negative edge weights can represent time savings. For instance, if the edge weight between two cities represents travel time, a negative weight might indicate that taking a certain route reduces travel time, perhaps due to a shortcut or a faster mode of transportation.
Financial transactions: In financial networks or economic models, negative edge weights can represent gains or profits. For instance, in a trade network, negative weights could signify profitable transactions or reduced costs associated with certain trade routes.
Resource allocation: Negative edge weights can also model situations where resources are saved or conserved. For example, in a network representing resource allocation, a negative weight could denote a more efficient use of resources or a reduction in resource consumption along a particular path.

Bellman-Ford algorithm

The algorithm itself is similar to Dijkstra's algorithm, in that it repeatedly relaxes edge estimates. The main difference, however, is that Bellman-Ford repeatedly relaxes all edges in the graph, whereas Dijkstra's algorithm relaxes the edges connected only to each vertex as each vertex is finalized.

Since the Bellman-Ford algorithm can work on graphs with negative weights, a possible output of the algorithm is that there are no shortest paths due to the existence of cycles in the graph with net negative cost. If such negative cycles exist, then the algorithm returns that fact. Otherwise, the algorithm returns the shortest paths from a given vertex.

In pseudocode, the algorithm works as follows:

Initialization: Set the distance from the source vertex to itself as 0, and all other distances as infinity.
Relaxation: Repeat the following relaxation step for $$|V| - 1$$ times:
- For each edge $$(u, v)$$ in the graph, relax the edge by updating the distance to vertex $$v$$ if the distance to $$u$$ plus the weight of $$(u, v)$$ is less than the current distance estimate to $$v$$.
Detection of Negative Cycles: After the relaxation step, perform an additional iteration over all edges. If any distance is updated during this iteration, it indicates the presence of a negative cycle in the graph.

Watch the following video to learn about the Bellman-Ford algorithm:

Time analysis

We mentioned above that the Bellman-Ford algorithm is slower than Dijkstra's algorithm. That's because whereas Dijkstra's algorithm is $$O(|E|log|V|)$$, Bellman-Ford is $$O(|E||V|)$$. This is easily seen from the pseudocode of the algorithm: relax all $$|E|$$ edges repeatedly for $$O(|V|)$$ times.

Summary

Watch the following short videos to summarize the concept of Bellman-Ford and see another example:

Floyd-Warshall Algorithm

Let's explore another fundamental algorithm in the realm of graph theory: the Floyd-Warshall algorithm. While Dijkstra's algorithm and the Bellman-Ford algorithm focus on finding shortest paths from a single source vertex to all other vertices, the Floyd-Warshall algorithm takes a different approach, aiming to find the shortest paths between all pairs of vertices in a weighted graph.

Algorithm

Watch the following video to learn about the Floyd-Warshall algorithm:

Summary:

The algorithm uses a dynamic programming approach, keeping track of a table of shortest path distances between each pair of vertices, dp
The algorithm performs $$|V|$$ iterations
At each iteration $$k$$, the algorithm considers all pairs of vertices $$i$$ and $$j$$, and updates the table dp[i][j] if there exists a shorter path from $$i$$ to $$j$$ through vertex $$k$$
At the end of the algorithm, the dp table contains all shortest path distances between every pair of vertices

Time analysis

To find all possible $$|V|^2$$ shortest paths between all vertices $$i$$ and $$j$$, the algorithm does $$O(|V|)$$ amount of work for each shortest path, leaving an overall running time of $$O(|V|^3)$$. It's also worth noting that the algorithm requires $$O(|V|^2)$$ extra space to keep track of all of the distances for every pair of vertices.

Summary

Watch the following video for a summary of the Floyd-Warshall algorithm:

Bipartite Graphs

A bipartite graph is a special type of graph that can be divided into two distinct sets of vertices, such that no two vertices within the same set are adjacent. Here's an example:

Formally, a graph $$G = (V, E)$$ is bipartite if its vertex set $$V$$ can be split into two disjoint sets, $$U$$ and $$W$$, with the property that every edge in $$E$$ connects a vertex in $$U$$ to a vertex in $$W$$. This means that there are no edges connecting vertices within the same set.

Applications

Bipartite graphs are particularly important in various fields due to their unique structure. They are commonly used to model relationships between two different types of entities. For example, in a bipartite graph representing a job allocation system, one set of vertices might represent workers, and the other set represents jobs, with edges indicating which workers are eligible for which jobs.

Similarly, in a social network analysis, one set might represent people and the other set could represent activities, with edges denoting participation. This structure is also prevalent in fields such as scheduling, matching problems, and network flow, where bipartite graphs facilitate the representation and solution of complex relational problems.

Summary

Watch the following videos to learn more about bipartite graphs:

Stable Marriage Problem

The stable marriage problem (SMP) is a classic problem in the field of combinatorial optimization and computer science, which seeks to find a stable matching between two sets of elements given a set of preferences.

The problem is widely used in matching markets, such as assigning medical residents to hospitals, students to schools, and even in some organ donation programs where donors and recipients are paired. Its importance and utility in economics, game theory, and real-world applications have made it a fundamental problem in algorithm design and analysis, showcasing the intersection of theoretical computer science and practical optimization problems.

The most common scenario involves two equally sized sets of individuals, typically referred to as men and women, each of whom ranks all members of the opposite set based on their preference for marriage partners. The goal is to pair each man with a woman in such a way that no pair of individuals would both prefer each other over their current partners. This condition ensures that no "blocking pairs" exist, which would make the matching unstable.

Bipartite graphs

In the context of the stable marriage problem, bipartite graphs provide a useful way to visualize and understand the matching process. For the SMP, the two disjoint sets in the bipartite graph represent the two groups to be matched, typically men and women. Each vertex in one set (men) is connected to vertices in the other set (women) based on the potential for marriage. An edge between a man and a woman signifies that they are potential partners.

Gale-Shapley algorithm

The Gale-Shapley algorithm, also known as the deferred acceptance algorithm, is a solution to the stable marriage problem that ensures a stable matching between two equally sized sets of participants.

The algorithm proceeds in rounds, with each man proposing to his most-preferred woman who has not yet rejected him. Each woman then reviews her current set of proposals, including any existing tentative engagements, and tentatively accepts the proposal from the man she prefers the most while rejecting the others. Rejected men then move on to propose to their next choice. This process repeats until no man has a new woman to propose to, at which point all tentative engagements become final.

Watch the following videos to learn more about the stable marriage problem and Gale-Shapley algorithm:

Week 7 Quiz

Complete this week's quiz in Gradescope by navigating to the appropriate link.
Export it as a pdf using the Google Chrome plugin: https://gofullpage.com/. This plugin will do multiple screen captures while scrolling through your document, and then stitch them into a pdf that you can download.
Upload the generated pdf to Anchor below.

Assignment 4

Due date: May 26

Graph Algorithms

In this assignment, you will complete a series of exercises that test your knowledge of graph algorithms.

▶️ Access the assignment on GitHub Classroom

Remember...

Read the instructions carefully
Plan before you code
Debug if you aren't getting the desired output
Attend office hours if you need additional support
Ask for help in Discord

Advanced Algorithms

Welcome to week 8!

Learning objectives

By the end of this week, you will be able to:

Trace examples of quicksort, explain its usage as a divide-and-conquer algorithm, and analyze its average case running time
Analyze and differentiate between naive string matching algorithms and more advanced algorithms such as Rabin Karp
Formulate and solve linear programming problems
Describe the concept of duality as it relates to optimization problems

What's due this week

Week 8 Quiz (on Gradescope), due Thursday before class starts

Quicksort

In this course and the previous DSA course, we have seen many different sorting algorithms:

Insertion sort
Selection sort
Radix sort
Merge sort
Heapsort

We have finally reached our last sorting algorithm: quicksort. Similar to merge sort, quicksort is a divide-and-conquer algorithm, since it repeatedly divides a list to break it down into smaller and smaller subproblems, and recursively sorts those subproblems.

Let's take a closer look at how quicksort works.

Algorithm

Watch the following videos to learn about quicksort:

To summarize:

Quicksort is a recursive, divide-and-conquer algorithm
It repeatedly rearranges (partition) the elements so that we end up with two sublists that meet the following criterion: each element in left array <= each element in right array
It applies quicksort recursively to the sublists, stopping when a sublist has a single element

Choice of pivot

There are many possible ways of choosing the quicksort pivot, including:

The first element or the last element
- Risky, because it could lead to worst case behavior if the list is sorted or almost sorted
The middle element of the list
A randomly chosen element
The median of three elements
- For example, the median of the first, last, and middle elements

Check your understanding

Consider the following list: [7, 39, 20, 11, 16, 5]. Say that we choose the pivot to be the middle element (in this case, the 20). What would the list look like after partitioning around this pivot? What would be the next two sublists?

Click to reveal the answer.

Answer. The partition algorithm proceeds by starting from the left and looking for elements >= 20. It therefore stops at the 39. It then starts from the right, looking for elements <= 20. It therefore stops at the 5, and swaps the 39 and the 5: [7, 5, 20, 11, 16, 39]. It then continues from the left, stopping at the 20 (since it is >= 20). Continuing from the right, it stops at 16, and swaps the 20 and 16: [7, 5, 16, 11, 20, 39]. Continuing from the left, it goes past the 11 and then stops again at the 20 (since it is >= 20). Continuing from the right, it stops at the 11 (since it is <= 20). Since the pointers coming from the two sides have crossed, no swap is needed, and the partition step ends with the list as [7, 5, 16, 11, 20, 39].

At this point, the left sublist is [7, 5, 16, 11], and the right subarray is [20, 39].

Application

Quicksort is widely recognized for its impact and extensive application in various fields due to its efficient average-case performance and in-place sorting capability. It is frequently used in systems where speed is crucial, such as in database management, search engines, and large-scale data processing.

Quicksort's ability to handle large datasets efficiently makes it a preferred choice in many real-time applications and performance-critical environments. The algorithm's in-place sorting feature minimizes memory usage, making it suitable for systems with limited resources. Additionally, Quicksort is often the default sorting algorithm in many programming language libraries, including C's standard library (qsort), Python's Timsort (which includes elements of quicksort), and Java's Arrays.sort. Its combination of simplicity, speed, and low memory overhead ensures that quicksort remains a foundational algorithm in computer science and software engineering.

Summary

Here's a couple more good videos about quicksort, in case you need some more examples:

Quicksort Analysis

We mentioned in the last lesson that quicksort is similar to merge sort, since they are both divide-and-conquer algorithms that improve on the running time of more primitive sorting algorithms (such as insertion sort or selection sort). However, whereas merge sort is $$O(nlogn)$$ in all cases, quicksort has sa more nuanced running time that depends on the contents of the list.

Best case and worst case

Watch the following video to see the best case and worst case running time behavior of quicksort:

To summarize:

In the best case, each partition divides the remaining portion of the list approximately in half. In this caes, there are $$O(logn)$$ partitions -- when visualized as a call tree, the height of the tree is $$O(logn)$$. At each of the levels of that tree, $$O(n)$$ amount of operations are performed to do the partitioning. Therefore, the best case running time is $$O(nlogn)$$.
In the worst case, each partition reduces the size of the list by one element, i.e., the pivot is chosen to be the smallest or largest element of each sublist. In this case, the call tree has a height of $$O(n)$$, with $$O(n)$$ operations performed at each level to perform the partition, leading to a running time of $$O(n^2)$$.
Notice that the choice of partition can have a significant impact on the running time of the algorithm.

Average case

In the average case of quicksort, there is a mixture of good and bad partitions. In practice, the running time in the average case is closer to $$O(nlogn)$$ than it is to O$$(n^2)$$. Watch the following video to see an empirical explanation of the average case:

String Matching

We now leave quicksort to look at a familiar algorithm: string matching. In other words, given a string $$S$$, we want to find whether a string $$W$$ occurs within that string.

For example, if $$S$$ is ABC ABCDAB ABCDABCDABDE and $$W$$ is ABCDABD, then $$W$$ appears in $$S$$ (at position 15).

Naive algorithm

At this point, you're likely familiar with the "naive" string matching algorithm, which checks every possible position in $$S$$ for $$W$$:

def naive_string_match(s, w):
    n = len(s)
    m = len(w)

    # Loop through the text from the start to the point where the pattern can fit
    for i in range(n - m + 1):
        # Check for a match at this position
        match = True
        for j in range(m):
            if s[i + j] != w[j]:
                match = False
                break

        # If match is still True, we found a match
        if match:
	    return i

    return -1

The worst-case running time for this algorithm is $$O(n * m)$$, where $$n$$ is the length of the longer string $$S$$ and $$m$$ is the length of the smaller string $$W$$. This is because the algorithm compares each character of the pattern with the text for each possible starting position.

However, we can do better than this algorithm. One example is the Knuth-Morris-Pratt algorithm.

Knuth-Morris-Pratt algorithm

The Knuth-Morris-Pratt (KMP) algorithm improves on naive string search using the following algorithm:

Preprocessing phase: The KMP algorithm preprocesses the pattern to create a partial match (or "failure") table, which takes $$O(m)$$ time.
Matching phase: The actual string matching is done in $$O(n)$$ time. The KMP algorithm avoids redundant comparisons by using the information from the partial match table, ensuring that each character in the text is compared at most once.

Watch the following video to see how the KMP algorithm works:

As described above, this algorithm is significantly faster, and operates in $$O(n + m)$$ time. However, the space complexity of the algorithm increases to $$O(m)$$, which is more space than the naive algorithm.

Rabin-Karp Algorithm

Believe it or not, we can actually improve even further on the string matching problem than the KMP algorithm. In the Rabin-Karp algorithm, we can search for a string $$M$$ in $$S$$ again in $$O(n + m)$$ time, but by using only $$O(1)$$ space.

Algorithm

Watch the following video to see how the Rabin-Karp algorithm works:

To summarize:

The algorithm uses hashing to convert the pattern and substrings of the text into numerical values. This allows for quick comparisons between the pattern and text substrings based on their hash values.
The algorithm employs a rolling hash function to efficiently compute the hash value of the next substring in constant time by updating the hash value of the current substring. This avoids recomputing the hash from scratch for each substring.
If the hash values of the pattern and a text substring match, a direct comparison is performed to confirm the match. This step ensures that false positives (hash collisions) are correctly identified and handled.
The main drawback of the Rabin-Karp algorithm is the possibility of hash collisions, where different substrings have the same hash value. This requires additional verification steps, potentially affecting performance. Additionally, the algorithm's average-case time complexity is $$O(n + m)$$, but its worst-case time complexity can degrade to $$O(n * m)$$ if many hash collisions occur.

Summary

Here's another video that summarizes the Rabin-Karp technique:

Linear Programming

Linear programming (LP) is a mathematical method used for optimizing a linear objective function, subject to a set of linear inequality or equality constraints.

The primary goal of linear programming is to find the best possible outcome, such as maximum profit or lowest cost, in a mathematical model whose requirements are represented by linear relationships. Linear programming is widely used in various fields, including economics, business, engineering, and military applications, for solving problems related to resource allocation, production planning, transportation, and scheduling.

Formulating LP problems

To formulate a linear programming problem, you need to define three main components:

the objective function
the decision variables
the constraints

The objective function is a linear equation that represents the goal of the optimization, such as maximizing revenue or minimizing costs. The decision variables are the unknowns that you are solving for, and they represent the quantities you want to determine. The constraints are linear inequalities or equalities that represent the limitations or requirements of the problem.

For example, consider a company that produces two products, $$A$$ and $$B$$. The company wants to maximize its profit, where the profit from product $$A$$ is $5 per unit and from product $$B$$ is $4 per unit.

The production of these products is subject to resource constraints: each unit of product $$A$$ requires 2 hours of labor and 1 unit of material, while each unit of product $$B$$ requires 1 hour of labor and 2 units of material. The company has a total of 100 hours of labor and 80 units of material available. The linear programming model for this problem would be:

Objective function: Maximize profit = $$5A + 4B$$

Constraints:

$$2A + B \leq 100$$ (labor constraint)
$$A + 2B \leq 80$$ (material constraint)
$$A, B \geq 0$$ (non-negativity constraint)

Simplex algorithm

Linear programming problems can be solved using various methods, the most common of which is the simplex algorithm.

To see an overview of linear programming and the simplex algorithm, watch the following video:

Duality

Note: this lesson is optional.

An important concept in linear programming is duality. Every linear programming problem (referred to as the primal problem) has a corresponding dual problem. The dual problem provides insights into the original problem and can sometimes be easier to solve.

The relationship between the primal and dual problems is such that the optimal solution to one provides information about the optimal solution to the other. Specifically, the value of the objective function in the primal problem at the optimal solution equals the value of the objective function in the dual problem at the optimal solution. This principle is known as the Strong Duality Theorem.

Primal Problem Example

Suppose a company produces two types of products, $$x_1$$ and $$x_2$$. The company wants to maximize its profit, with profits of $3 per unit of $$x_1$$ and $5 per unit of $$x_2$$. The production is subject to the following constraints:

The production of $$x_1$$ and $$x_2$$ requires different amounts of two resources, $$R1$$ and $$R2$$.
Resource $$R1$$ has a maximum availability of 4 units.
Resource $$R2$$ has a maximum availability of 12 units.
Each unit of $$x_1$$ requires 1 unit of $$R1$$ and 3 units of $$R2$$.
Each unit of $$x_2$$ requires 2 units of $$R1$$ and 2 units of $$R2$$.

The primal linear programming problem can be formulated as follows:

Primal Problem:

Maximize $$z = 3x_1 + 5x_2$$

subject to:

$$ \begin{align*} x_1 + 2x_2 & \leq 4 & \text{(Resource R1 constraint)} \ 3x_1 + 2x_2 & \leq 12 & \text{(Resource R2 constraint)} \ x_1, x_2 & \geq 0 & \text{(Non-negativity constraint)} \end{align*} $$

Dual Problem Example

To formulate the dual problem, we follow these steps:

Each constraint in the primal problem corresponds to a variable in the dual problem.
The objective function coefficients in the primal problem become the right-hand side constants in the dual problem.
The right-hand side constants in the primal problem become the objective function coefficients in the dual problem.
The inequalities are reversed, and the primal's "less than or equal to" constraints become the dual's "greater than or equal to" constraints.

Dual Problem:

Minimize $$ w = 4y_1 + 12y_2 $$

subject to:

$$ \begin{align*} y_1 + 3y_2 & \geq 3 & \text{(from coefficient of } x_1 \text{ in primal objective)} \ 2y_1 + 2y_2 & \geq 5 & \text{(from coefficient of } x_2 \text{ in primal objective)} \ y_1, y_2 & \geq 0 & \text{(Non-negativity constraint)} \end{align*} $$

Interpretation

Primal Problem: The company wants to maximize profit by deciding how much of each product ($$x_1$$ and $$x_2$$) to produce, given the resource constraints.
Dual Problem: The dual problem can be interpreted as a resource allocation problem, where $$y_1$$ and $$y_2$$ represent the shadow prices of resources $$R1$$ and $$R2$$, respectively. The objective is to minimize the total cost of resources, subject to the constraint that the resource prices provide at least as much value as the profit contributions of the products.

Relationship Between Primal and Dual

According to the Strong Duality Theorem, if both the primal and dual problems have feasible solutions, then the optimal value of the objective function in the primal problem (maximum profit) equals the optimal value of the objective function in the dual problem (minimum cost). This relationship provides valuable insights into the economics of resource allocation and pricing in optimization problems.

Practice

Here is a selection of practice problems from LeetCode involving this week's topics:

Week 8 Quiz

Complete this week's quiz in Gradescope by navigating to the appropriate link.
Export it as a pdf using the Google Chrome plugin: https://gofullpage.com/. This plugin will do multiple screen captures while scrolling through your document, and then stitch them into a pdf that you can download.
Upload the generated pdf to Anchor below.

Intractable Problems

Welcome to week 9!

Learning objectives

By the end of this week, you will be able to:

Describe the difference between problems in various complexity classes
Explain the traveling salesman problem and solve it using naive and approximation algorithms
Recognize the challenges in solving whether P = NP

What's due this week

Week 9 Quiz (on Gradescope), due Thursday before class starts
Assignment 5, due Sunday

Traveling Salesman Problem

To begin this week's material, let's tackle a very famous problem in computer science: the traveling salesman problem (TSP).

Here's one way of thinking about the TSP: Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city?

Definition

The input to the traveling salesman problem is:

A set of $$n$$ cities.
A distance matrix (or a set of distances in some format) where each entry specifies the distance between two cities.

The objective is to find the shortest possible route that visits each city once and returns to the starting city.

Watch the following video to see an overview of the TSP:

Brute force algorithm

One way of solving the TSP is to use a brute force approach -- generating all possible permutations of the cities to find the shortest possible route that visits each city exactly once and returns to the starting city.

Here’s a detailed description of this approach:

Generate all possible permutations of the cities. For $$n$$ cities, there are $$n!$$ ($$n$$ factorial) possible permutations. However, since the route is a cycle (i.e., starting and ending at the same city), we can fix one city as the starting point and only consider permutations of the remaining $$n−1$$ cities. This reduces the number of permutations to $$(n-1)!$$.
For each permutation (route), calculate the total distance by summing the distances between consecutive cities in the permutation and adding the distance from the last city back to the starting city.
Compare the total distances of all permutations and keep track of the permutation with the shortest distance.

Complexity

The brute force approach has a time complexity of $$O(n!)$$, where $$n$$ is the number of cities. This factorial growth makes the brute force approach impractical for large $$n$$. The space complexity is $$O(n)$$, for storing the current permutation and other necessary variables.

Due to its factorial time complexity, the brute force approach is only feasible for small instances of the TSP (usually up to about 10-12 cities). For larger instances, the computation time becomes prohibitively large, necessitating the use of more sophisticated algorithms.

Held-Karp algorithm

One such algorithm is the Held-Karp algorithm. Proposed by Michael Held and Richard M. Karp in 1962, this algorithm is a cornerstone method in computer science, and significantly reduces the computational complexity of solving the TSP compared to brute-force methods. It leverages the principle of dynamic programming to break down the problem into smaller subproblems, which are solved once and stored for future use, thus avoiding redundant calculations.

The algorithm works by iteratively building up the solution using subsets of the cities. It starts with the smallest possible subsets and progressively adds more cities, calculating the minimum cost to reach each subset's final city from the starting point. For each subset of cities, it records the shortest path that ends at each specific city within the subset. This process continues until it encompasses all cities, ultimately determining the shortest path for the complete set.

Watch the following algorithm to learn about the Held-Karp algorithm:

Complexity

Although the Held-Karp algorithm reduces the time complexity significantly from factorial to exponential (specifically, $$O(n^2*2^n)$$), it is still computationally intensive for very large numbers of cities. Despite this, it remains a critical theoretical approach and a benchmark for evaluating heuristic and approximation algorithms for the TSP.

Hard Problems

For most of the problems that we have seen in our study of algorithms, we have seen polynomial time algorithms. Polynomial time algorithms are generally considered to be fairly efficient. For many of the simple algorithms we've seen (searching, sorting, etc.), we are used to seeing logarithmic or linear algorithms, and consider polynomial time algorithms (e.g., selection sort as an $$O(n^2)$$ algorithm) to be inefficient. However, for more complex, real-world problems (especially with graphs), polynomial time algorithms are considered fairly efficient.

As seen in the previous lesson, the traveling salesman problem is a hard problem to solve. In fact, it is an intractable problem -- a problem for which no known polynomial time algorithm exists.

To learn more about this and other, similar hard problems, watch the following videos:

Complexity Classes

To formalize the concepts of whether algorithms can be solved in polynomial time (or, whether a solution can be verified in polynomial time), we define the concept of complexity classes. Watch the following videos to learn more:

P vs. NP

The P vs. NP problem is one of the most fundamental and profound questions in computer science and mathematics, concerning the relationship between two classes of problems: those that can be solved in polynomial time (P) and those for which a solution can be verified in polynomial time (NP).

Specifically, the problem asks whether every problem whose solution can be verified quickly (in polynomial time) can also be solved quickly (in polynomial time), which would imply P = NP.

Despite significant efforts, no one has yet proven whether P equals NP or not. If P were equal to NP, it would mean that many complex problems in fields such as cryptography, optimization, and algorithm design could be solved efficiently, drastically altering our approach to computation.

Conversely, if P is not equal to NP, it would affirm the inherent difficulty of these problems, implying that some problems can be verified quickly but not solved quickly. This unresolved question continues to be a central focus of theoretical computer science, with far-reaching implications across numerous domains.

Watch the following videos to learn more about the P vs. NP problem:

Reductions

This lesson is optional.

In complexity theory, the concept of reductions is fundamental for comparing the complexity of different problems. A reduction is a way of transforming one problem into another in such a manner that a solution to the second problem can be used to solve the first problem. This concept is used to show that one problem is at least as hard as another.

To see an example of a reduction, watch the following video:

Approximation Algorithms

Approximation algorithms are a vital tool in tackling NP-hard optimization problems, where finding an exact solution is computationally infeasible due to exponential time complexity.

These algorithms seek to find solutions that are close to the optimal within a factor of the true solution. The performance of an approximation algorithm is often measured by the approximation ratio, which compares the quality of the solution provided by the algorithm to the quality of the optimal solution.

For example, a $$ρ$$-approximation algorithm guarantees that the solution will be within a factor of $$ρ$$ of the optimal solution. This approach is especially useful in real-world scenarios where near-optimal solutions are sufficient and can be obtained in a reasonable amount of time.

The design of approximation algorithms involves various techniques, such as greedy methods, local search, and linear programming relaxations. Greedy algorithms build up a solution piece by piece, always choosing the next piece that offers the most immediate benefit. Local search methods iteratively improve an initial solution by making small, local changes. Linear programming relaxations involve solving a relaxed version of the problem, where some constraints are loosened to make it easier to solve, and then converting the relaxed solution into a feasible solution for the original problem.

Approximation algorithms have been successfully applied to numerous NP-hard problems, including the TSP. Watch the following video to learn more about approximation algorithms for the TSP:

Reading

Read about intractable problems in Chapter 12 of An Open Guide to Data Structures and Algorithms.

Week 9 Quiz

Complete this week's quiz in Gradescope by navigating to the appropriate link.
Export it as a pdf using the Google Chrome plugin: https://gofullpage.com/. This plugin will do multiple screen captures while scrolling through your document, and then stitch them into a pdf that you can download.
Upload the generated pdf to Anchor below.

Assignment 5

Due date: June 14

Traveling Salesman

In this assignment, you will complete a couple of algorithms related to the traveling salesman problem.

▶️ Access the assignment on GitHub Classroom

Remember...

Read the instructions carefully
Plan before you code
Debug if you aren't getting the desired output
Attend office hours if you need additional support
Ask for help in Discord

Week 10

Welcome to week 5! In this week, we will focus on wrapping up assignments and will have a final exam.

What's due this week

Final exam on Thursday
Assignment 5, due Friday
All other assignment late submissions, due Friday

Final Exam

We will have our final exam on Thursday, June 13 at 6:00 PM GMT. Here are the details:

Link to Exam

You can access the exam on OnlineExamMaker here: https://t.onlineexammaker.com/doexam/pQEOjonqLl1.html.

Date and time

Held during Week 10 during our normal class time: Thursday, June 13 at 6:00 PM GMT
You can access the link to the final exam ahead of time, but it won't be enabled until 6:00 PM GMT the day of the exam

Material

Everything that was covered during the term, with an emphasis on weeks 5 - 9
Excludes the following lessons:
- Floyd-Warshall algorithm (week 7)
- Stable marriage problem (week 7)
- Linear programming (week 8)
- Duality (week 8)
- Reductions (week 9)

Format

Administered online using a tool called OnlineExamMaker
You can prepare and use any amount of notes on paper, but no other resources on the computer or Internet are allowed
75 minute exam from the time you begin
Be sure to start the test between 6:00 - 6:15 PM GMT (it will close after that point)
Exam will be composed of 20 multiple choice questions, shown one at a time
You can't go back and change your answer after moving on to the next question
We won't be using Zoom during the exam; you can just access the exam individually when you're ready
Plan to have scratch paper for tracing/diagramming etc.

Proctoring software

The exam website is also a proctoring service
It will ask you to turn your camera on, and will take some pictures at random during the test
It will flag any activity such as closing the test, changing programs, switching tabs, etc. If you do this too many times, it will close the exam and you will not be able to continue. Once you start the exam, you should expect to stay on the exam page and complete each question without any interruptions or using your computer for anything else.
It disables copy and paste during the test
It gives us the option to monitor your activity

Exam environment

Plan to be in a test-taking environment
Quiet, mostly private

Practice exam

There is a practice exam that you can take here: https://t.onlineexammaker.com/doexam/6QZO0aDmLJz.html. It is the same type of questions that will be on the final exam, but is only half the length (10 questions).

Questions

If you have any questions leading up to the exam, please ask in #dsa2-help on Discord or reach out to the instructor.

Course Name