Exercises with Convex Sets

Working through a few practice problems related to Convexity.
notes
Published

May 27, 2022

Is the set \(S = \{Ax + b | Fx = g\}\) affine?

Consider two points \(x_1\) and \(x_2\) in \(S\). Is \(y = \theta x_1 + (1-\theta)x_2\) in the set (for \(\theta \in \mathbf{R})\)?

\[Fy = F( \theta x_1 + (1-\theta)x_2)\] \[\quad = \theta Fx_1 + (1-\theta)Fx_2\] and since \(x_1, x_2 \in S\) we have \[Fy = \theta g + (1-\theta)g = g\]

This means \(y \in S\) and that \(S\) is affine.

Check whether \(S = \{ \alpha \in \mathbf{R}^{3} | \alpha_1 +\alpha_2 e^{-t} + \alpha_3 e^{-2t} \leq 1.1 \text{ for } t \geq 1\}\) is affine, convex and/or a polyhedron.

\(S\) cannot be a polyhedron since a polyhedron is defined by finitely many linear inequalities (the condition \(t \geq 1\) is definitely not on finitely many \(t\)).

Suppose \(\alpha, \beta \in S\) then is \(\gamma = \theta \alpha + (1-\theta) \beta\) in \(S\)?

\[\gamma_1 +\gamma_2 e^{-t} + \gamma_3 e^{-2t} = \theta \alpha_1 + (1-\theta) \beta_1 + (\theta \alpha_2 + (1-\theta) \beta_2) e^{-t} + (\theta \alpha_3 + (1-\theta) \beta_3) e^{-2t}\] \[= \theta (\alpha_1 +\alpha_2 e^{-t} + \alpha_3 e^{-2t}) + (1-\theta)(\beta_1 +\beta_2 e^{-t} + \beta_3 e^{-2t}) \leq 1.1\]

  • the last inequality holds if \(\theta \in [0,1]\) as $ (_1 +_2 e^{-t} + _3 e^{-2t}) $ and \((1-\theta)(\beta_1 +\beta_2 e^{-t} + \beta_3 e^{-2t}) \leq (1-\theta) * 1.1\).

  • The last inequality won’t hold for \(\theta \in \mathbf{R}\) as $ (_1 +_2 e^{-t} + _3 e^{-2t}) $ if \(\theta < 0\) (a similar observation holds for \((1-\theta)\)) so \(S\) is not affine.

Thus, \(\gamma \in S\) if \(\theta \in [0,1]\) so \(S\) is convex.

If two sets can be separated by a hyperplane, then they are convex. True or False?

False. (Can do a proof by picture)

A point \(x_0\) on the boundary of a convex set uniquely defines a supporting hyperplane for the set at that point. True or False.

False. If \(x_0\) is, for example, the corner of a square in \(\mathbf{R}^2\), then there are infinitely many supporting hyperplanes at that point.

Consider the cone \(K = \{(x_1,x_2) | 0 \leq x_1 \leq x_2\} \subseteq \mathbf{R}^2\).

  • Is \((1,3) \preceq_{K} (3,4)\) True or False? False. \((1,3) \preceq_{K} (3,4) \Longleftrightarrow (3,4) - (1,3) \in K\). So the question is if \((2,1) \in K\)? Clearly not since the \(x_1 > x_2\).

  • Is \((-1,2) \succeq_{K^{\ast}} 0\) True or False? True. Now \((-1,2) \succeq_{K^{\ast}} 0 \Longleftrightarrow (-1,2)^{T}x \geq 0, \forall x \succeq_{K} 0\). But $(-1,2)^{T}x = -x_1 + x_2 >=0 $ for any \(x \succeq_{K} 0\) (since \(x_1 \leq x_2\)) thus this claim is True.

What is the distance between the two parallel hyperplanes \(H_1 = \{x \in \mathbf{R}^{n} | a^Tx=b_1\}\) and \(H_2 =\{x \in \mathbf{R}^{n} | a^Tx=b_2\}\)?

Let’s turn to 2d for inspiration.

The line with end points \((b_2/a_1,0)\) and \((0,b_2/a_2)\) is on the hyperplane \(\{x \in \mathbf{R}^{2} | a^Tx=b_2\}\) while that with end points \((b_1/a_1,0)\) and \((0,b_1/a_2)\) is on the hyperplane \(\{x \in \mathbf{R}^{2} | a^Tx=b_2\}\).

We have been asked to find \(d\).

The vector \(x = (x_1,x_2)\) and \(\hat{x} = (\hat{x}_1,\hat{x}_2)\) are collinear with the vector \(a = (a_1,a_2)\). Thus, the cosine of the angle between the vectors \(x\) with \(a\) and \(\hat{x}\) with \(a\) should equal \(1\) (since \(\cos 0 =1\)).

\(\frac{a_1x_1 + a_2x_2}{||a||_2||x||_2} = 1\) and \(\frac{a_1\hat{x}_1 + a_2\hat{x}_2}{||a||_2||\hat{x}||_2} = 1\) or that \(\frac{b_2}{||a||_2||x||_2} = 1\) and \(\frac{b_1}{||a||_2||\hat{x}||_2} = 1\) (since \(x\) and \(\hat{x}\) lie on \(H_1\) and \(H_2\) respectively).

Thus, we get that \(||x||_2 = \frac{b_2}{||a||_2}\) and \(||\hat{x}||_2 = \frac{b_1}{||a||_2}\) (these measure the lengths of the vector \(x\) and \(\hat{x}\) respectively).

Hence, \(d = ||\hat{x}||_2 - ||x||_2 = \frac{b_1 - b_2}{||a||_2}\)

Another way to think about this is that for \(\theta \in [0,1]\) a point of the form \((\frac{b_2}{a_1}\theta, \frac{b_2}{a_2}(1-\theta))\) lies on \(H_1\). So,in particular, for \(\theta = \frac{a_1^2}{||a||^2_2}\) we have that \((x_1,x_2) = (\frac{b_2}{a_1}\frac{a_1^2}{||a||^2_2}, \frac{b_2}{a_2}\frac{a_2^2}{||a||^2_2})\) lies on \(H_1\). Thus, \(x = (\frac{b_2}{||a||^2_2}a_1, \frac{b_2}{||a||^2_2}a_2)\)

Similarly, \(\hat{x} = (\frac{b_1}{||a||^2_2}a_1, \frac{b_1}{||a||^2_2}a_2)\)

So d = \(||x-\hat{x}||_2 = \sqrt{ \frac{(b_2-b_1)^2}{||a||^4_2}a^2_1 + \frac{(b_2-b_1)^2}{||a||^4_2}a^2_2} = \sqrt{\frac{(b_2-b_1)^2}{||a||^2_2}} = \frac{|b_2-b_1|}{||a||_2}\)

Let \(a\) and \(b\) be distinct points in \(\mathbf{R}^{n}\) and consider the set of points that are closer (in Euclidean norm) to \(a\) than \(b\) i.e., \(\mathcal{C} = \{x | ||x-a||_2 \leq ||x-b||_2 \}\). Show that \(\mathcal{C}\) is a halfspace (namely that it can be described by a set of the form \(\{x | c^Tx \leq d\}\) for \(c \neq 0\)).

Norms are strictly non-negative so, \[||x-a||_2 \leq ||x-b||_2 \Longleftrightarrow ||x-a||^{2}_2 \leq ||x-b||^{2}_2 \]

Thus, \((x-a)^T(x-a) \leq (x-b)^T(x-b)\) or \(x^Tx -2a^Tx + a^Ta \leq x^Tx -2b^Tx + b^Tb\).

So \(2(b^T - a^T)x \leq b^Tb - a^Ta\) or \(2(b-a)^Tx \leq b^Tb - a^Ta\) which satisfies the definition of a half space of the form \(\{x | c^Tx \leq d\}\) with \(c = (b-a)^T\) and \(d=\frac{b^Tb - a^Ta}{2}\).

Which of the following sets is convex?

  • A slab, i.e., a set of the form \(\{x \in \mathbf{R}^{n} | \alpha \leq a^Tx\leq \beta \}\). Yes, convex since this is an intersection of two half spaces (it is also a polyhedron). Recall a half space is convex and an intersection of convex sets is convex.
  • A rectangle, i.e., a set of the form \(\{x \in \mathbf{R}^{n} | \alpha_i \leq x_i\leq \beta_i \}\). A rectangle is sometimes called a hyperrectangle when \(n > 2\). Yes, convex since it is an intersection of finitely many half spaces (it’s also a polyhedron).
  • A wedge, i.e., a set of the form \(\{x \in \mathbf{R}^{n} | a_1^Tx\leq b_1, a_2^Tx\leq b_2 \}\). Yes, convex since an intersection of two half spaces.
  • The set of points closer to a given point than a given set i.e., \(\{x | ||x - x_0||_2 \leq ||x-y||_2 \text{ for all } y\in S\}\) where \(S \subseteq \mathbf{R}^{n}\). We can see that \(S = \cap_{y\in S} S(y)\) where \(S(y) = \{x | ||x - x_0)||_2 \leq ||x-y||_2\}\) (for some specific choice of \(y \in S\)). Each \(S(y)\) is convex and hence \(S\) is convex since it is the intersection of convex sets.
  • The set of points closer to one set than another, i.e., \(\mathcal{C} = \{x | dist(x,S) \leq dist(x,T)\}\), where \(S,T \subseteq \mathbf{R}^{n}\), and \(dist(x,S) = \text{inf}\{||x-z||_2 | z \in S\}\). Suppose \(x,y \in \mathcal{C}\) then for \(\theta \in [0,1]\) is \(c=\theta x + (1-\theta) y\) in \(\mathcal{C}\)?

Since \(x,y \in \mathcal{C}\) we have \(\text{inf}\{||x-z||_2 | z \in S\} \leq \text{inf}\{||x-z||_2 | z \in T\}\) and \(\text{inf}\{||y-z||_2 | z \in S\} \leq \text{inf}\{||y-z||_2 | z \in T\}\). Then, \(\text{inf}\{\theta||x-z||_2 | z \in S\} + \text{inf}\{(1-\theta)||y-z||_2 | z \in S\} \leq \text{inf}\{\theta||x-z||_2| z \in T\} + \text{inf}\{(1-\theta)||y-z||_2| z \in T\}\)

In order to show \(\text{inf}\{||c-z||_2 | z \in S\} \leq \text{inf}\{||c-z||_2 | z \in T\}\) we could potentially argue that \(\text{inf}\{||c-z||_2 | z \in S\} \leq \text{inf}\{\theta||x-z||_2 | z \in S\} + \text{inf}\{(1-\theta)||y-z||_2 | z \in S\}\) (Triangle inequality?)

However we likely won’t be able to show \(\text{inf}\{||c-z||_2 | z \in T\} \leq \text{inf}\{\theta||x-z||_2| z \in T\} + \text{inf}\{(1-\theta)||y-z||_2| z \in T\}\) (it may go the other direction).

So I feel \(\mathcal{C}\) is not convex.

Let \(x\) be a real-valued random variable with \(\mathbf{prob}(x=a_{i})=p_i, i=1,\ldots,n\), where \(a_1 < a_2 < \ldots < a_n\). Of course \(p \in \mathbf{R}^n\) lies in the standard probability simplex \(P = \{p | \mathbf{1}^Tp=1, p \succeq 0\}\). Which if the following conditions are convex in \(p\). (That is, for which of the following conditions is the set of \(p \in P\) that satisfy the condition convex?).

Now \(\mathbf{E}f(x) = \sum_{i=1}^{n}p_i f(a_i) = w^Tp\) where \(w^{T} =(f(a_1),\ldots,f(a_n))\). Essentially the \(w\)’s are constants.

  • \(\alpha \leq \mathbf{E}f(x) \leq \beta\) - This is a slab so it is convex.
  • \(\mathbf{prob}(x > \alpha) \leq \beta\) - Now \(\mathbf{prob}(x > \alpha) = \sum_{i:a_{i} > \alpha}p_{i}\). We can define an indicator function that takes the value \(0\) when \(i\) is such that \(a_i \leq \alpha\) and \(1\) otherwise. The expecation of that indicator function is the same as the desired probability. So \(\mathbf{prob}(x > \alpha) = \sum_{i} p_{i} f(a_i)\) where \(f(a_i) = I(a_i > \alpha)\) so this is a half space and hence convex.
  • \(\mathbf{E}|x^3| \leq \alpha \mathbf{E}|x|\) - This is nothing but \(\sum_{i=1}^{n} (|a_i^3| - \alpha|a_i|)p_i \leq 0\) which is a half space hence it is convex.
  • \(\mathbf{E}|x^2| \leq \alpha\) - This is nothing but \(\sum_{i=1}^{n} |a_i^2|p_i \leq 0\) which is a half space hence it is convex.

What is the dual cone of:

  • \(K = \{0\} \subseteq \mathbf{R}^2\)? The dual cone of a cone \(K\) is \(K^{\ast} = \{y | y^Tx \geq 0 \forall x \in K\}\). So for this case we will have \(K^{\ast} = \{y | y^Tx \geq 0 \text{ for all } x \in \{0\}\}\). Hence \(K^{\ast} = \mathbf{R}^2\).

  • \(K = \mathbf{R}^2\)? \(K^{\ast} = \{y | y^Tx \geq 0 \text{ for all } x \in \mathbf{R}^2\} = \{0\}\)

  • \(K = \{ (x,y)| \text{ } |x| \leq y \}\)? A 2d picture really helps here, as we are essentially looking a region of vectors which has a non negative inner product with \(K\). \(K^{\ast} = \{ (x,y)| \text{ } |x| \leq y \}\)

  • \(K = \{ (x,y) | x + y = 0 \}\)? Draw a picture of this line in 2d. Then convince yourself that \(K^{\ast} = \{ (x,y) | x - y = 0 \}\)

Which conditions, in terms of the ordinary inequalities on matrix coefficents, must hold true for the elements of the positive semidefinite cone \(\mathbf{S}^{n}_+\) for \(n=1,2,3\):

Recall that: 1. \(X \in \mathbf{S}^{n}_+ \Longleftrightarrow z^TXz \geq 0\) for all \(z\).

  1. Per [1], a symmetric \(n x n\) matrix is positive semidefinite if and only if all principal minors (determinants of symmetric submatrices) of the the matrix are nonnegative. In contrast, for a symmetric \(n x n\) matrix to be positive definite we only need all of the leading principal minors (upper left determinants) to be positive (Sylvester’s criterion).
  • \(n=1,\quad [x_1]\): Thus we must have \(z^2x \geq 0\) for all \(z\). Since \(z^2 \geq 0\) we must have \(x_1 \geq 0\).

  • \(n=2\), \[\begin{equation*} X = \begin{bmatrix} x_1 & x_2 \\ x_2 & x_3 \end{bmatrix} \end{equation*}\] We want \[\begin{equation*} \begin{bmatrix} z_1 & z_2 \end{bmatrix} \begin{bmatrix} x_1 & x_2 \\ x_2 & x_3 \end{bmatrix} \begin{bmatrix} z_1 \\ z_2 \end{bmatrix} =x_1z_1^2 + 2x_2z_1z_2 + x_3z_2^2 \geq 0 \end{equation*}\] So for \(X\) to be positive semidefinite we must have \(x_1 \geq 0\), \(x_3 \geq 0\) and \(x_1x_3 -x^2_2 \geq 0\).

  • \(n=3\), \[\begin{equation*} X= \begin{bmatrix} x_1 & x_2 & x_3 \\ x_2 & x_4 & x_5 \\ x_3 & x_5 & x_6 \end{bmatrix} \end{equation*}\] So for \(X\) to be positive semidefinite we must have \(x_1 \geq 0\), \(x_4 \geq 0\), \(x_6 \geq 0\), \(x_4x_6 -x^2_5 \geq 0\), \(x_1x_6 -x^2_3 \geq 0\), \(x_1x_4 -x^2_2 \geq 0\) and \(x_6x_4x_1 - x_6x_2^2 - x_4x_3^2 -x_5^2x_1 + 2x_5x_3x_2 \geq 0\)

Show the set \(C = \{(x,y) \in \mathbf{R}^2_{++} | xy \geq \alpha \}\) is convex.

Consider two points \((x_1,y_1) \in C\) and \((x_2,y_2) \in C\). So we have that \(y_1 \geq \frac{\alpha}{x_1}\) and \(y_2 \geq \frac{\alpha}{x_2}\) (since both \((x_1,y_1), (x_2,y_2) \in \mathbf{R}^2_{++}\) we can safely divide by \(x_1\) and \(x_2\)). Thus, for \(\theta \in [0,1]\) we have

\[\theta y_1 + (1-\theta)y_2 \geq \alpha\left[ \frac{\theta}{x_1} + \frac{1-\theta}{x_2} \right]\]

Multiplying both sides by \(\theta x_1 + (1-\theta)x_2\) we get that

\((\theta x_1 + (1-\theta)x_2)(\theta y_1 + (1-\theta)y_2) \geq \alpha\left[ \theta^2 + \theta(1-\theta)\frac{x_1}{x_2} + \theta(1-\theta)\frac{x_2}{x_1} + (1-\theta)^2 \right]\) \[ \geq \alpha\left[ 1 + 2\theta^2 -2\theta + \theta(1-\theta)\frac{x_1^2 + x_2^2}{x_1x_2} \right]\] \[ \geq \alpha\left[ 1 - 2\theta(1 -\theta) + \theta(1-\theta)\frac{x_1^2 + x_2^2}{x_1x_2} \right]\] \[ \geq \alpha\left[ 1 - \theta(1 -\theta)\left(2- \frac{x_1^2 + x_2^2}{x_1x_2}\right) \right]\] \[ \geq \alpha\left[ 1 + \theta(1 -\theta)\frac{(x_1 - x_2)^2}{x_1x_2} \right]\]

Since \(1 + \theta(1 -\theta)\frac{(x_1 - x_2)^2}{x_1x_2} \geq 1\) we have shown that \((\theta x_1 + (1-\theta)x_2)(\theta y_1 + (1-\theta)y_2) \geq \alpha\).

Thus \((\theta x_1 + (1-\theta)x_2, \theta y_1 + (1-\theta)y_2) \in C\) and hence \(C\) is convex.

Note that \(C = \{(x,y) \in \mathbf{R}^2_{+} | xy \geq 0 \}\) is convex since in that case \(C = \{(x,y) \in \mathbf{R}^2_{+}\}\) which is convex.

Notation

  • Let \(A\) be a symmetric \(n \times n\) matrix then a principal minor of order \(k\) is obtained by deleting \(n-k\) rows and the \(n-k\) columns with the same numbers (where \(k \in [1,\ldots,n]\)). Use \(\Delta_k\) for the principal minor of order \(k\).
    • There are \(\binom{n}{k}\) principal minors of order \(k\). Thus, there are a total of \(2^{n}-1\) principal minors.
  • The leading principal minor of \(A\) of order \(k\) is the minor of order \(k\) obtained by deleting the last \(n-k\) rows and columns (where \(k \in [1,\ldots,n]\)). Use \(D_k\) for the leading principal minor of order \(k\).
    • The leading principal minors are a subset of the principal minors.
    • The leading principal minors are the upper left determinants of the matrix.

References

[1]
J. E. Prussing, “The principal minor test for semidefinite matrices,” Journal of Guidance, Control, and Dynamics, vol. 9, no. 1, pp. 121–122, 1986.