Submodular Optimization and document summarization

20 Jun, 2023

In this blog post, we study submodular optimization, exploring its applications in solving knapsack problems, more specifically the document summarization problem. Submodular functions, which exhibit properties akin to convexity, play a role in these applications by enabling the development of simple yet effective algorithms to approximate solutions for the notoriously difficult NP-hard discrete optimization problems.

Submodular Optimization

Let’s start by understanding the concept of submodularity. There are two equivalent definitions that characterize submodular functions. The first definition states that a function $f: 2^V \rightarrow \mathbb{R}$ is submodular if, for any subsets $A$ and $B$ in the power set $2^V$, the following inequality holds:

$$ f(A) + f(B) \geq f(A\cap B) + f(A \cup B) $$

Alternatively, a more intuitive definition of submodularity is expressed through the law of diminishing returns. According to this definition, a function $f: 2^V \rightarrow \mathbb{R}$ is submodular if and only if it satisfies the following property:

$$ f(A\cup {e}) - f(A) \geq f(B \cup {e}) - f(B) \quad \text{for all } A \subseteq B \in 2^V \text{ and } e \in V \setminus B $$

In simpler terms, submodularity implies that the incremental value added by including an element in a set decreases as the size of the set increases.

Submodular functions possess a natural property of diminishing returns, making them well-suited for various applications such as approximation algorithms, game theory (as user preference models), and electrical networks.

Submodular Optimization

Submodular functions exhibit similarities to convex and concave functions, which allows us to frame optimization problems involving convex or concave functions as the maximization or minimization of submodular functions under certain constraints.

Consider a set of objects $V = {v_1, . . . , v_n}$ and a function $F : 2^V \rightarrow \mathbb{R}$ that assigns a real value to any subset $S \in 2^V$. Our goal is to find the subset $S$ with a bounded size, $|S| \leq k$, that maximizes the function:

$$ \arg\max_{S \in 2^V} F(S) $$

Unfortunately, this optimization problem is generally NP-hard, posing a significant challenge despite its prevalence in numerous important applications. However, if the function $F$ is monotone submodular, the maximization remains NP-complete. Nevertheless, there exist efficient algorithms that can provide near-optimal solutions for this class of functions. For instance, a well-known result states that maximizing a monotone submodular function subject to a cardinality constraint can be achieved using a greedy algorithm (Nemhauser et al., 1978) within a constant factor (0.63) of the optimal solution.

In this article, we will focus on the knapsack optimization problem under a budget constraint, which can be formally stated as follows:

$$ \max_{S \in 2^V} F(S) : \sum_{i \in S} c_i \leq \mathcal{B} $$

Here, $V$ represents the ground set of elements, $S$ denotes a subset of $V$, $c_i$ represents the non-negative cost associated with selecting element $i$, and $\mathcal{B}$ represents our budget.

We are fortunate that submodularity arises naturally in various discrete domains, akin to how convexity manifests in the continuous domain. In this article, we will explore document summarization.

Algorithm

We will now introduce two algorithms that address the aforementioned optimization problem. The first algorithm applies specifically to monotone submodular functions in the context of knapsack problems with a budget constraint. This algorithm was proposed by Hui Lin and Jeff Bilmes in their paper titled Multi-document Summarization via Budgeted Maximization of Submodular Functions. The algorithm is outlined below:

Initialize an empty set $G$.
Set $U$ as $V$.
While $U \neq \emptyset$, perform the following steps:
- Select $k$ as $\arg\max_{l \in U} \frac{f(G\cup{l})-f(G)}{(c_l)^r}$.
- If $\sum_{i\in G} c_i + c_k \leq B$ and $f(G \cup {k}) - f(G) \geq 0$, then:
  - Add $k$ to $G$: $G \leftarrow G \cup {k}$.
- Remove $k$ from $U$: $U \leftarrow U \setminus {k}$.
Set $v^*$ as $\arg\max_{v\in V,c_v \leq B} f({v})$.
Return $G_f$ as $\arg\max_{S\in { {v^*}, G}} f(S)$.

There are other algorithms available (such as the double greedy algorithm) that can be applied to non-monotone submodular functions. However, for the purposes of this article, we will focus on the aforementioned algorithm. To facilitate its implementation, I have developed a Python package called pysubopt (available at https://github.com/mcordier/pysubopt). You can utilize this package as shown in the example below:

from opt_submodular import GreedySubmodularOptimizer

def mono_sub_func(S):
    # Define your submodular function implementation
    res = ...
    return res

def cost(S):
    # Define the cost function
    res = ...
    return res

budget = ...

optimizer = GreedySubmodularOptimizer(
    fun=mono_sub_func,
    cost_fun=cost,
    budget=budget,
    r=1.,
    is_lazy=True,
)

# Further usage and experimentation with the optimizer

By leveraging the pysubopt package and the provided example, you can explore submodular optimization for your document summarization and contagion maximization problems. This algorithm serves as a starting point to approximate near-optimal solutions, offering a valuable tool for tackling NP-hard discrete optimization challenges."

Document Summarization

Document summarization can be modelised as a maximization of a function representing the covering of the summary. Generally, the functions studied by the past are submodular. Based on this, a class of submodular functions is designed for document summarization from Lin and Bilmes (2011), which enforces fidelity, and rewards diversity. This function is monotone and submodular, and simple greedy algorithms guarentees a near-optimal solution.

In this study, we will compare the performance of this function with another function, $f_{MMR}$ submodular but non-monotone, with a new dataset (Opinosis).

Problem Formulation

We introduce first the following notations :

The ground set $V$ corresponds to all the sentences in a document.
Extractive document summarization: select a small subset $S$ that accurately represents the document (ground set V).
The summary is required to be length-limited : \
- $c(S)$ : cost for sentences S (e.g., the number of words in all sentences of S) \
- $b$: the budget (e.g., the largest length allowed) \
A set function $f : 2^V \xrightarrow{} \mathbf{R}$ measures the quality of the summary S,

Thus, we can formulate the problem as follow $$ S^* \in \argmax_{S\subset V} f(S) \qquad \text{subject to} \quad c(S) \leq B $$

If $f$ is submodular, then we can apply the previously described algorithm.

A non monotone submodular function : $f_{MMR}$ function

We define $f_{MMR}$ as the following :

$$ \forall S \subseteq V, f_{MMR}(S) = \sum_{i\in V} \sum_{j \in S} w_{i,j} - \lambda \sum_{i,j \in S: i \neq j} w_{i,j}, \lambda\geq 0 $$

where $w_{i,j} = cosinussimilarity(X^{tfid}_{i}, X_j^{tfid})$

This function is submodular, but not monotone.

A monotone submodular function : $\mathcal{F}_{divcov}$ function

We say that $\mathcal{F} : 2^V\xrightarrow{} \mathbf{R}$ is coverage-diversity submodular function if it has the following form : $$ \begin{aligned} \forall S \in 2^V, \mathcal{F}(S)& =\mathcal{L}(S) + \lambda \mathcal{R}(S) \\ & = \sum_{i \in V} \min ({C_i(S), \alpha \mathcal{C}_i(V))} + \lambda \sum_i^K \sqrt{a_i} \end{aligned} $$

where $a_i = \sum_{j \in P_{i} \cap S} r_{j}$, $\lambda \geq 0$, and $C_i(S): 2^V \rightarrow \mathbf{R}$ is a monotone submodular function, and $r_i$ indicates a singleton reward, which represents the importance of $i$ in the summary. $(P_i)_{i=1…K}$ is partition of the set $V$. Thus, $\mathcal{L}$ measures the coverage (“fidelity”) and $\mathcal{R}$ rewards diversity.

In Lin (2011), $\forall i, C_i(S) = \sum_{j \in S} w_{i,j}$ and $\forall j, r_j = \frac{1}{N} \sum_{i \in V} w_{i,j}$ where $w_{i,j}$ is the cosinus similarity between $i$ and $j$.

Thus, we reuse this formulation and call $F_{covdiv}$ the following function:

$$ \begin{aligned} F_{covdiv}(S) = \sum_{i \in V} \min{\sum_{j \in S} w_{i,j}, \alpha \sum_{j \in V} w_{i,j}} + \lambda \sum_i^K \sqrt{\sum_{j \in P_i \cap S} \frac{1}{N} \sum_{i\in V} w_{i,j}} \end{aligned} $$

where $w_{i,j}$ is the cosinus similarity between $X^{Tfid}_i$, $X^{Tfid}_j$}.

To define and optimise a coverage-diversity function, we need to compute a clustering to create a partition. In the experiment, we will use a k-mean algorithm to compute clusters over the sentences. Let’s now focus a little bit on each term $\mathcal{L}$ and $\mathcal{R}$, and prove that they are both monotone and submodular.

Coverage function

The function $\mathcal{L}: 2^V \xrightarrow{} \mathbb{R}$ defined by $\forall S \subseteq V, \mathcal{L}(S) = \sum_{i \in V} \min{\sum_{j \in S} w_{i,j}, \alpha \sum_{j \in V} w_{i,j}}$ is non-decreasing submodular.

Proof

The non-decreasing property is trivial.

(Sub)modularity of $\sum_{j \in S} w_{i,j}$ : Let’s take $A\subseteq B \subset V$ and an element $v$ not in A. $\sum_{j \in (A+v)} w_{i,j}-\sum_{j \in (A)} w_{i,j} = \sum_{j \in (B+v)} w_{i,j}-\sum_{j \in (B)} w_{i,j}$, so this is “modular” (equality).
The min operator is concav. Thus, it shows that $\min{\sum_{j \in S} w_{i,j}, \alpha \sum_{j \in V}}$ is submodular.
Since $\mathcal{L}$ is a sum of positive weighted submodular function, $\mathcal{L}$ is submodular.

Diversity function

For a partition $P_{k}$, the function $\mathcal{R}: 2^V \xrightarrow{} \mathbb{R}$ defined by $\forall S \subseteq V, \mathcal{R}(S) = \sum_i^K \sqrt{\sum_{j \in P_i \cap S} \frac{1}{N} \sum_{i\in V} w_{i,j}}$ is non-decreasing submodular.

Proof

Again, the non-decreasing property is trivial.

It is easy to see that $\sum_{j \in P_i \cap S} \frac{1}{N} \sum_{i\in V} w_{i,j}$ is modular
Since the square function is concave, $[\sqrt{\sum_{j \in P_i \cap S} \frac{1}{N} \sum_{i\in V} w_{i,j}}]$ is submodular.
Thus, since this a sum of submodular function, positively weighted. This concludes the proof

Implementation

In the repo https://github.com/mcordier/pysubopt, we provide an implementation of this document summarization problem. Once install, you can use the following command to test a summarization:

doc-summarize mmr-greedy --path path/to/document.txt --budget=10

It will return sentences which summarize the selected document.

For instance, taking a random wikipedia page about Louis IX, with a limited budget of 100 words, this is the output result:

“Louis IX’s reign is often marked as an economic and political zenith for medieval France, and he held immense respect throughout Christendom.”"

“Louis’ admirers through the centuries have celebrated him as the quintessential Christian monarch.”

“His skill as a knight and engaging manner with the public contributed to his popularity, although he was occasionally criticized as being overly pious, earning the moniker of a “monk king”.”

which is not so bad considering the simplicity of the algorithm.

Conclusion

In conclusion, submodularity provides a powerful framework for document summarization. By treating summarization as an optimization problem, submodular functions offer a systematic approach to evaluate the quality of summaries.

This blog post discussed two commonly used submodular functions: $f_{MMR}$ and $\mathcal{F}{covdiv}$. The $f{MMR}$ function balances sentence inclusion and redundancy, while the $\mathcal{F}_{covdiv}$ function promotes coverage and diversity through sentence clustering. For more details and other applications of submodularity, check out the full package at https://github.com/mcordier/pysubopt.

Thank you for reading, and stay tuned for more!

#Programming #Mathematics #Machine Learning #Statistics