# Programming Exercises for Theory of Algorithms (BMETE91AM57/TA)

Oktató:
Kurzus típus:
Labor
Nyelv:
angol
Félév:
2020/21/2
Órarendi információ:

Cs 10:15-11:00

The aim of this subject is to maintain the programming skills of the students through the solution of the programming tasks related to the Theory of Algorithms, and at the same time to facilitate a better understanding of the algorithms and the efficiency and complexity of them.

"""
I encoded this program myself, did not copy or rewrite the code of others,
and did not give or send it to anyone else.

"""


You have to follow the next rules. You must not submit or look at solutions or program code that are not your own. It is an act of plagiarism to submit work that is copied or derived from the work of others and submitted as your own. You must not share your solution code with other students, nor ask others to share their solutions with you. You must not ask anybody to write code for you. But many forms of collaboration are acceptable and even encouraged. It is usually appropriate to ask others for hints and debugging help or to talk generally about problem solving strategies and program structure. In fact, we strongly encourage you to seek such assistance when you need it. Discuss ideas together, but do the coding on your own.

## Contents

1. Pattern matching (2021-02-20 20:00)
2. Nondeterministic finite automaton (2021-02-27 22:00)
3. Breadth first search (BFS) (2021-03-06 22:00)
4. Finding left derivation (2021-03-13 22:00)
5. Effective verifier (2021-03-20 22:00)
6. Euklidean algorithm (2021-03-27 22:00)
7. Brute force (2021-04-03 22:00)
8. Integer programming (2021-04-17 22:00)

## 1. Pattern maching

• Write a program for comparing the simple matching and the quick search algorithms!
1. The input for the program is given in a terminal command.
python3 1AdrianSmith.py text.text pattern


The program read the file text.text and search for the first occurrence of the given pattern in it. The command arguments can be read out from the inbuilt array sys.argv:

import sys

print('Filename: {0}, pattern: {1}.'.format(sys.argv[1], sys.argv[2]))
print("The name of this program has {0:2} characters".format(len(sys.argv[0])))
# {:2} would be equivalent


The sys.argv[0] contains the name of the program, the len() function gives the length of it. {0:2} means that the 0-th argument of the format()

function (that is the number of characters) is written into a two-character place in the output.

2. For reading a file use the file functions, e.g.
f = open(filename, 'r')
f.close()


The strip() method cut down the unnecessary starting and ending white spaces.

3. In your program you do not need to check if the specified text file exists or whether it actually contains text. The pattern is always at most as long as the text. The text contains lowercase letters of the English alphabet a-z and spaces. There are only lowercase letters in the pattern, but no spaces. According to the algorithms, the program allows the comparison of a single character of the text and the pattern, but does not use the Python 3 string functions except the len() function.
4. Your program have to run the simple matching and the quick search algorithms with the given text and pattern. The output of the program specifies whether the pattern is in the text, at which character position and how many comparisons have been made to the first occurrence. Give 5 character space to the number of comparisons, and 4 character spaces to the position. The position number of the first character must be 1 (not 0). The output have to be look like this:
Simple matching:  2610 comparison(s), the pattern starts at 2321
Quick search:      270 comparison(s), the pattern starts at 2321


Specify the number of comparisons even if the pattern does not appear in the text:

Simple matching: 53610 comparison(s), no pattern
Quick search:     5470 comparison(s), no pattern

• If you use the end="..." option in the print() function, the output of the next print() call will be continued in the same line.
• In Python 3 you may use special or accented lettes (e.g. you may write your name if it contains such letters of any language), but avoid such letters in the variable names, although it is possible.
• Try your program with the next downloadable text files and with the given patterns (of course in the test phase of your program, you may use your own very simple text files and patterns):
File Pattern
1.text overscrupulous
1.text zombies
2.text aaaaaaaaa
2.text bbbbbbbbb
2.text aaaaaaaab
2.text aaaaaaaba

## 2. Nondeterministic finite automaton

• The program automaton.py simulates the running of a simple non-deterministic finite automaton.
python3 automaton.py word


where word is the word you want to recognize.

2. The program uses the built-in set function to keep records of the currently available states in the computing tree of the non-deterministic finite automaton.
3. Draw the automaton and determine what language does the automaton recognize? Please, write the answer in a comment into the code:
# The original program recognize the language: ...


Do not attach the figure.

• Change the description of the automaton in the file automaton.py between the lines
# the description of the automaton starts here


and

# the description of the automaton ends here


so that the program accepts the words that contain even number of letters a or contain the string algpr (or both)! So e.g. 'aaalgprbbb' and 'asdaefaa' are accepted but 'aaalgbb' is not.

## 3. Breadth first search (BFS)

• Using the Breadth first search (BFS) algorithm known from the subject Combinatorics 1, we can find the shortest possible path in a graph between two given vertices. The task is to program this algorithm.
• The input of the algorithm is a simple directed graph, which is defined as follows in a text (.text) file:
1. The first line of the file contains the natural numbers $N$, $M$, $S$, $T$ separated by spaces.
• The set of vertices of the simple directed graph $\vec{G} = (V, \vec{E})$ is $V = \{0, 1, 2, \ldots, N-1\}$.
• The number of edges is $\left| \vec{E} \right| = M$.
• We would like to find the shortest directed path from the vertex $S \in V$ to the vertex $T \in V$ ($S \ne T$).
2. The next $M$ lines contain "$U_i$ $V_i$" pairs of numbers. Such a pair represent the edge $(U_i, V_i) \in \vec{E}$. An edge appears only once in the file.
3. If there is a path from $S$ to $T$, the program lists the vertices starting with $S$ and finished with $T$. If there is no path, it prints out "no path".
• As a help we give a file graph.py, where we provide the input function of the graph and the output function of the path. So the search function search_path has to be written.
1. The first parameter of this funcion is graph. This is the edge-list representation of the graph. The edge-list is given in a dictionary, where a list is belonging to every vertex. The elements in the list are vertices available by the outgoing edges. For example look at the graph of 6 vertices and 8 edges given in the next file
6 8 2 5
1 0
2 1
3 2
1 3
0 5
2 4
4 5
4 3


The next edge-list belongs to this graph

graph = {0: [5], 1: [0, 3], 2: [1, 4], 3: [2], 4: [5, 3], 5: []}

2. The next parameter of the function from_where gives the first vertex $S$ and the third parameter to_where gives the last vertex $T$.
3. You may use the read_path function, which read out a path from a (not necessarily complete) BFS-tree between the from_where and to_where vertices. The BFS-tree is given in the parent dictionary, where the value assigned to a vertex is the number of its parent in the tree. For example in the case of
parent = {1: 2, 4: 2, 0: 1, 3: 1, 5: 4}
from_where = 2
to_where = 5


the shortest path is [2, 4, 5].

4. When implementing the algorithm, it is not necessary to split the edges into classes (forward, backward,...), and not necessary to search in the whole graph, it is enough to find one shortest path.
5. Buil-in list functions like append and pop(0) can be used to implement a queue. (But one may try the modul queue.)
• Test your program on the files graf1.text and graf2.text files, but also on other self-made input files!

## 4. Finding left derivation

• The breadth first search is suitable for deriving a word in a context-free grammar.
1. Consider the CF grammar $G=(V,\Sigma,R,S)$, and construct the directed graph $\vec{G}=((V\cup\Sigma)^*,\vec{E})$ where the vertices are the strings of terminal and non-terminal symbols. This graph has countably infinite number of vertices, but it is enough to search in a finite part of it for finding a derivation of a word.
2. The graph $\vec{G}$ contains the edge $(\alpha, \beta)$ if and only if there is a left-derivation step from the string $\alpha \in (V \cup \Sigma)^*$ to the string $\beta \in (V \cup \Sigma)^*$, in other words we get $\beta$ if we apply a substitution rule from $R$ on the first non-terminal symbol of $\alpha$. Formally \begin{equation*} (\alpha, \beta) \in \vec{E} \Longleftrightarrow \exists w \in \Sigma^* \; \exists \gamma, \delta \in (V \cup \Sigma)^* \; \exists A \in V \colon (A \rightarrow \delta) \in R, \alpha = wA\gamma \textrm{ and } \beta = w\delta\gamma. \end{equation*}
3. In this graph the left derivations of the word $w \in L(G)$ are exactly the $S \leadsto w$ paths, where $S \in V$ is the start variable. All edges of a path correspond to a rule applied on the leftmost non-terminal symbol. We want to find such a path in this task.
• We give the skeleton of the program in the file CF.py. The only command line argument of this program is a word to derive. The prgram prints out the derivation to the standard output.

The separation of the terminal and non terminal symbols is solved by a simple method: all capital letters A to Z of the English alphabet are possible non-terminal symbols while all other characters are terminal.

1. The CF grammar is represented by the class Grammar. The start symbol is given to the constructor argument start_variable. Complete the add_rule method such that it adds the left and right side of a rule to the language. If necessary, modify the __init__ constructor creating a suitable data structure to store the rules.
2. The derive method gets a word as a parameter and returns a left derivation in a list. If there is surely no such derivation, it returns False. To do this, it uses the breadth first search (presented in the previuos task) algorithm realized by the private function _bfs_search. Because the graph is now infinite, we can not list all edges. Instead, we use the _next method producing a list of vertices that can be reached on an edge from the string vertex in the graph.

Complete the _next method such that it returns a list of words which are available from the received string vertex by a single left derivation rule. For example, if the grammar is \begin{equation*} S \to aSB \mid bSA \mid \epsilon, \quad A \to aA \mid \epsilon, \quad B \to bB \mid \epsilon, \end{equation*} then

self._next('aSbbB') == ['aaSBbbB', 'abSAbbB', 'abbB']

3. Let's try the program and find derivation to some given words. Please, answer the following questions in a """-type comment at the end of the code! (1) What language is specified in the file CF.py? (2) What happens when we give a word to the program that is not in the language? (3) A non mandatory question and task: how could we change the program code to handle this case? Do it!
• The isinstance() function and islower(), extend(), append() methods might help.

## 5. Effective verifier

• The task in this week is to write a Python program that implements an effective verifier algorithm deciding on a given graph that a given sequence of numbers represents a Hamiltonian cycle in the graph. (This task is preparing for a later topic in Algorithm theory where you will learn that the language HAM of graphs containing Hamiltonian cycles is in NP.)
• It is easy to decide for a graph and a sequence of numbers that the sequence forms a Hamiltonian cycle in the graph – one just have to check that each vertex occurs exactly once and there are edges between each pair of consecutive vertices, and between the first and last vertex.
• The program receives the input in the form of command line arguments.
    python3 5AdrianSmith.py graf.text witness.text

• The first input (in the example graf.text) is a non directed simple graph that we enter in a text (.text) file
1. There are two natural numbers separated by spaces in the first line $N$ and $M$ where
• the simple graph $G = (V, E)$ has $N$ vertices, namely $V = \{0, 1, 2, \ldots, N-1\}$,
• and has $M$ edges, that is $\left| E \right| = M$.
2. The next $M$ lines of the file contains pairs of numbers $U_i, V_i$. One such pair corresponds to a (non directed) edge $(U_i, V_i)$. Every edge is listed only once in the file.
3. To process the input we recommend modifying the code of the appropriate part of the 3rd task. Note that the first line of the file here does not contain the vertices $S$ and $T$. Since the edges of the graph are not directed, you may want to add edges to the edge list of both endpoints!
• The second input of the program is a sequence of integers. The task of the program is to decide whether the sequence of numbers represents a Hamiltonian cycle in the graph (verify whether the sequence is a witness of the existence of a Hamiltonian cycle).
• If it is a Hamiltonian cycle (a witness) print the text
witness


otherwise print

not a witness

• Try the program with the file graf.text and with the next input files:
Filename Expected output
t1.text witness
t2.text not a witness
t3.text not a witness
t4.text not a witness
t5.text witness
t6.text not a witness
t7.text not a witness

## 6. Euklidean algorithm

• The Euclidean algorithm, or Euclid's algorithm, is an efficient method for computing the greatest common divisor of two numbers. If the input $a, b \in \mathbb{Z}^+$ are two binary numbers, then the length of input is $O(\log a + \log b)$, and the Euclidean algorithm calculate the value $\mathop{\mathrm{gcd}}(a, b)$ in at most $O(\log a + \log b)$ steps.
• In the file euklidean.py a part of a program is given which plot the heat map of the function $\log a + \log b$ on the $(a, b)$ coordinate-plane with the matplotlib Python package. The values of $a$ and $b$ are the integer numbers between $1$ and $100$. The heat map shows the value of the numbers with the colors of the rainbow (violet, indigo, blue, green, yellow, orange, red). The older versions of the matplotlib package uses violet for small and red for big numbers, but in the new versions the colors are changing from violet to yellow only.
• Modify the program to show the running of the euclidean algorithm! Plot three heat maps side by side in a window similar to one of the following. With the new version of matplotlib installed on your computer the result of the home work looks like this:

with the old version like this

The following functions should be shown on the heat maps:

1. The first heat map shows the function $\log_2 a + \log_2 b$, which is generated by the program downlaodable above.
2. The second heat map shows the number of steps of the Euclidean algorithm on each pair $(a, b)$. On the number of steps we mean: how many divisions with reminder made on $(a_i, b_i)$ pairs or decide to stop
3. The third heat map shows the value $\mathop{\mathrm{gcd}}(a, b)$.
• The subplot function of matplotlib package can be used to display multiple figures. The argument of this function is a 3-digit number $nmk$, where the $n$ is the number of rows, $m$ is the number of columns of subfigures on the canvas. The third $k$ is the index of the subfigure, which starts at 1 in the upper left corner and increases to the right. For example
plt.subplot(132)


means that the canvas has 3 subfigures in a row, and all the results of the function calls of plt appear on the second subfigure.

## 7. Brute force

• One possible way to solve NP-complete problems is to use "brute force." In this approach, we generate all possible witnesses according to the input and check them one by one. If we find a good witness, then the input is in the tested language, but if none of the generated witnesses are good, then the input is certainly not an element of the language. Of course, we cannot expect this procedure to solve the problem in polynomial time.
• The language $\textrm{IND}$ consists of pairs $(G, k)$ where a simple non-directed graph $G$ has $k$ independent vertices (an independent set is also called stable set, coclique or anticlique). It is a good witness for $(G, k) \in \textrm{IND}$ if there is a set $F \subseteq V(G)$ of $k$ independent vertices. If $|V(G)| = n$, then it is possible to decide in $O(n^2)$ time on a set $F$ of points if there is no edge between the points of $F$.
• Write a program that determines whether $(G, k)$ is in the $\textrm{IND}$ language. $G$ and $k$ is described in a file given in a command line argument. If you find an independent $k$-element vertex set, write it to the standard output. Otherwise indicate that there is no independent set of $k$ elements in the graph.
1. In the file ind.py a function read_graph is given which read the edges of $G$ and the number $k$ from the file given in the command line argument. The format of the input file and the return values described in the docstring of the source code.
2. Generate all $k$-element combinations of the vertex set $\{0, 1, \ldots, n - 1\}$ of the graph and decide if any of them is independent. Stop the search if you find an independent set and write it to the standard output. If there is no independent set indicate on the standard output that there is no independent $k$-set.
3. To help coding we provide the function combinations which write the 3-element combinations of the set $\{0, 1, \ldots, 9\}$ to the standard output using the itertools package. (Of course, use the ideas of this function only, not the function itself.)
• Try your program with the next files! For the inputs belonging to the language $\textrm{IND}$ we give an independent set as a verifier. Of course any other independent set will be accepted as a witness.
File Independent set
ftln1.text 1 2 7
ftln2.text no
ftln3.text 0 4 9 11 12 14 19
ftln4.text no
ftln5.text 2 5 18 29 39
ftln6.text no

## 8. Integer programming

• Another known method for solving NP-complete problems is to bring the problem back to another well-known NP-complete problem. If we have an effective heuristic solver for a well-known problem, we can usually achieve better results than with brute force. In this exercise, we slightly extend the previous week's IND problem, and solve the MAXIND problem using integer linear programming tools. In IND we had to decide if an independent $k$-set in a given graph exists or not. In MAXIND we have to find the max value of $k$ for which an independent $k$-set exists. We solve the integer linear program with the PuLP Python package. Install it with the built-in package manager pip (or pip3) using the command
pip install pulp

1. Read the input of the problem, that is the simple non-directed graph $G$ and $k$ from a file having the same format as in the previous exercise, although $k$ will not be used here. As with the previous task, the program will receive the file name as a command line argument.