21.4. NP
NP is a larger class of formal languages than P. It can be defined in at least two equivalent ways.
Non-Deterministic Polynomial Time
One way to define it is as the class of all formal languages that can be decided in polynomial time by a non-deterministic Turing machine. Indeed, NP stands for "Non-deterministic Polynomial time". Recall the concept of a non-deterministic finite automaton (NFA) from CSCI 3136. An NFA can make arbitrary decisions that do not depend on the input as part of its computation. For example, it can choose to follow \(\epsilon\)-transitions or not. Thus, the answer of the NFA is no longer uniquely determined by the input. We defined the language decided by the NFA as the language of all strings for which there exists such a set of choices that lead the NFA to accept the string. A non-deterministic Turing machine is an analogous extension of a Turing machine: it can make arbitrary choices that don't depend on the input, and a string belongs to the language decided by the Turing machine if there exists such a set of choices that lead the Turing machine to accept the string.
Polynomial-Time Verification
Most students do not feel very comfortable with the concept of non-determinism. So you may like the following, completely equivalent definition of NP in terms of deterministic Turing machines better.
Consider a language \(\mathcal{L} \subseteq \Sigma^*\) and let \(\mathcal{L}' \subseteq \Sigma^*\) be a language such that \(\sigma \in \mathcal{L}\) if and only if there exists a string \(\tau \in \Sigma^*\) such that \(\sigma\tau \in \mathcal{L}'\). You can think about \(\tau\) as a "proof" that \(\sigma \in \mathcal{L}\) and about \(\mathcal{L}'\) as the language of all pairs \((\sigma, \tau)\) such that \(\tau\) is a valid proof of \(\sigma\)'s membership in \(\mathcal{L}\). Correspondingly, we say that any Turing Machine \(M\) that decides \(\mathcal{L}'\) verifies \(\mathcal{L}\): Given a pair \((\sigma, \tau)\), it verifies whether \(\tau\) is a valid proof of \(\sigma\)'s membership in \(\mathcal{L}\). It may be that \(M\) rejects the input \((\sigma, \tau)\) even though \(\sigma \in \mathcal{L}\). This happens when \((\sigma, \tau) \notin \mathcal{L}'\). This only means that \(\tau\) fails to prove that \(\sigma \in \mathcal{L}\), not that \(\sigma \notin \mathcal{L}\). In particular, there may exist another string \(\tau' \in \Sigma^*\) such that \((\sigma, \tau') \in \mathcal{L}'\).
As an example, we can choose \(\mathcal{L}\) to be the language of all (encodings of) graph-integer pairs \((G, k)\) such that \(G\) has a vertex cover of size \(k\). For every string \(\sigma \in \mathcal{L}\) encoding a graph \(G\) such that \((G, k)\) is a yes-instance, choose a set of at most \(k\) vertices that cover all the edges of \(G\), and let \(\tau\) be an appropriate encoding of the identities of these vertices. Then \(\mathcal{L}'\) is the language of all such pairs \((\sigma, \tau)\). It is easy to construct a deterministic Turing Machine that takes such a pair of strings \((\sigma, \tau)\) and decides in polynomial time whether \(\tau\) encodes a set of at most \(k\) vertices and whether these \(k\) vertices cover all edges in the graph \(G\) encoded by \(\sigma\). In contrast, we don't know whether there exists a deterministic Turing Machine that decides in polynomial time whether \((G, k)\) is a yes-instance. In fact, most people believe that this is highly unlikely.
We say that a language \(\mathcal{L}\) can be verified in polynomial time if there exists a language \(\mathcal{L}' \in \textrm{P}\) of pairs \((\sigma, \tau)\) and a constant \(c\) such that \(\sigma \in \mathcal{L}\) if and only if there exists a string \(\tau \in \Sigma^*\) of length \(|\tau| \le |\sigma|^c\) such that \((\sigma, \tau) \in \mathcal{L}'\).
In other words, for a language \(\mathcal{L}\) to be verifiable in polynomial time, we only need to be able to decide in polynomial time whether some polynomial-length string \(\tau\) proves that a string \(\sigma\) belongs to \(\mathcal{L}\). We do not need to be able to find such a string efficiently or decide directly whether \(\sigma \in \mathcal{L}\).
The following is a definition of NP in terms of polynomial-time verification of formal languages:
NP is the class of all formal languages that can be verified in polynomial time.
Why do we need both conditions that \(\mathcal{L}' \in \mathrm{P}\) and that the string \(\tau\) that proves that \(\sigma \in \mathcal{L}\) (if such a string exists) has length polynomial in \(|\sigma|\)?
The first part is obvious: We want to perform the verification in polynomial time using a deterministic Turing machine, so \(\mathcal{L'}\) better be in P.
The second part is more subtle: The input to the Turing machine \(M\) that decides \(\mathcal{L}'\) is the pair \((\sigma, \tau)\), and \(\mathcal{L'}\) being in P guarantees only that \(M\) runs in time polynomial in \(|\sigma| + |\tau|\). However, we want the verification to take time polynomial in \(|\sigma|\). Thus, we need to ensure that the input to \(M\) has length polynomial in \(|\sigma|\), that \(|\tau| \le |\sigma|^c\), for some constant \(c\).
Equivalence of the Two Definitions of NP
Why are the two definitions of NP in terms of deciding a language in non-deterministic polynomial time or verifying the language in deterministic polynomial time equivalent? It is easy to prove that a language can be verified in deterministic polynomial time if and only if it can be decided in non-deterministic polynomial time:
From Non-Deterministic Decision to Deterministic Verification
Assume we have a non-deterministic Turing Machine \(M\) that decides the language \(\mathcal{L}\). Also assume that \(M\) produces an answer in at most \(|\sigma|^c\) time for any string \(\sigma \in \Sigma^*\) and whenever \(M\) makes a non-deterministic choice, it chooses from a constant number \(a\) of options. The choices made by \(M\) during some computation of length at most \(|\sigma|^c\) can be encoded using an \(a\)-ary number \(\tau\) with at most \(|\sigma|^c\) digits. We define a language \(\mathcal{L}'\) consisting of all pairs \((\sigma, \tau)\) such that \(M\) accepts \(\sigma\) if it makes the choices encoded by \(\tau\). This guarantees that \(\sigma \in \mathcal{L}\) if and only if there exists a string \(\tau\) such that \((\sigma, \tau) \in \mathcal{L}'\). Moreover, if \(\sigma \in \mathcal{L}\), then there exists a string \(\tau\) of length \(|\tau| \le |\sigma|^c\) such that \((\sigma, \tau) \in \mathcal{L}'\). Thus, any deterministic Turing machine that runs in polynomial time and decides \(\mathcal{L}'\) provides a polynomial-time verification of \(\mathcal{L}\).
Such a Turing machine \(M'\) is easy to construct: Given an input \((\sigma, \tau)\), \(M'\) performs the same computation as \(M\) performs on \(\sigma\). Whenever \(M\) would make a non-deterministic choice, \(M'\) reads the next digit from \(\tau\) and (deterministically) chooses the corresponding choice from the options available to \(M\). This guarantees that \(M'\) runs in polynomial time. It also guarantees that \(M'\) accepts \((\sigma, \tau)\) if and only if the choices encoded by \(\tau\) lead \(M\) to accept \(\sigma\), that is, if \((\sigma, \tau) \in \mathcal{L}'\)—\(M'\) decides \(\mathcal{L}'\).
From Deterministic Verification to Non-Deterministic Decision
For the other direction, assume we have constants \(c\) and \(d\), a language \(\mathcal{L}'\), and a deterministic Turing Machine \(M\) such that
-
\(\sigma \in \mathcal{L}\) if and only if there exists a string \(\tau\) of length \(|\tau| \le |\sigma|^c\) such that \((\sigma, \tau) \in \mathcal{L}'\),
-
\(M\) decides \(\mathcal{L}'\), and
-
Given any string \(\rho\), \(M\) produces an answer in at most \(|\rho|^d\) time.
We obtain a non-deterministic Turing Machine \(M'\) that decides \(\mathcal{L}\) in polynomial time as follows:
Given an input \(\sigma \in \Sigma^*\), \(M'\) starts by generating a string \(\tau \in \Sigma^*\) of length \(|\tau| \le |\sigma|^c\). It "guesses" this string using its ability to make non-deterministic choices. It then runs \(M\) on the input \((\sigma, \tau)\) and returns the result. This machine clearly runs in polynomial time in \(|\sigma|\). Does it decide \(\mathcal{L}\)?
Given a string \(\sigma \in \mathcal{L}\), there exists a string \(\tau\) of length at most \(|\sigma|^c\) such that \((\sigma, \tau) \in \mathcal{L}'\). One possible set of choices that \(M'\) is able to make generates exactly this string \(\tau\) and running \(M\) on the resulting string \((\sigma, \tau)\) then leads \(M'\) to accept \(\sigma\).
If \(\sigma \notin \mathcal{L}\), then no matter which string \(\tau\) of length at most \(|\sigma|^c\) \(M'\) guesses, we have \((\sigma, \tau) \notin \mathcal{L}'\). Thus, running \(M\) on \((\sigma, \tau)\) leads \(M'\) to reject \(\sigma\). Therefore, \(M'\) rejects \(\sigma\) for any possible set of non-deterministic choices it makes.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.