Weight/Mixing Matrix

In distributed optimization, existing works always employ a weight matrix/mixing matrix \(W\) to aggregate the information from neighbors. In this blog, I am going to first go through these assumptions and explicitly show their relations, for any undirected graph \(G=(\mathcal{V},\mathcal{E})\). Suppose there are \(n\) nodes, \(\mathcal{V} = \{1,2,\ldots,n\}\), and every node has a self-loop.


Assumption 1: The mixing matrix \(W = [w_{ij}]\in [0,1]^{n\times n}\) is defined according to the network topology: \(w_{ij}>0\) if \(\{i,j\}\in\mathcal{E}\); \(w_{ij}=0\), otherwise. W is doubly stochastic (\(W\mathbf{1} = \mathbf{1}, \mathbf{1}^T W = \mathbf{1}^T\)), and the spectral radius \(\rho(W-\frac{\mathbf{11}^T}{n})=\sigma<1\).

Note: There are many variations of the above assumption. For example, one way is to replace the doubly stochasticity with symmetry and stochasticity for undirected graph only!.


Why do we need such an assumption?

To make all the agents achieve consensus! From [1], Assumption 1 holds if and only if

\[\lim_{k\to \infty} W^k = \frac{\mathbf{11}^T}{n}.\]

In order words, following the update \(x^{k+1} = Wx^k\), we can derive the consensus

\[x^{\infty} = \frac{\mathbf{11}^T}{n} x^0.\]



Useful Facts of Stochastic Matrix

Fact 1: For any stochastic matrix \(A\), its absolute eigenvalues lie in \([0,1]\).

Fact 2: For any stochastic matrix \(A\), \(A\mathbf{1} = \mathbf{1}\).

Proof: See [2].

Eigenvalues of Weight Matrix


It’s easy to show that the number of connected components = \(\text{rank}(W-I)\). If \(G\) is connected, there is only one eigenvalue \(\lambda_1 = 1\). Let the eigenvalues of \(W\) be \(1=\lambda_1>|\lambda_2|\ge |\lambda_3|\ge \cdots\ge|\lambda_n|\). The the spectral gap of \(W = 1-|\lambda_2|:=\theta\).

What’s the relation between the spectral gap of \(W\) and \(\sigma\).


If \(W\) is symmetric, then \(\rho(W-\frac{\mathbf{11}^T}{n})= \Vert W-\frac{\mathbf{11}^T}{n}\Vert_2 = \lambda_{\max}(W-\frac{\mathbf{11}^T}{n}) = |\lambda_2|\).

Proof: Denote \(W = Q\Lambda Q^T\) as the eigendecomposition of \(W\). Then,

\[\begin{aligned} W- \frac{\mathbf{11}^T}{n} =& Q\begin{bmatrix}1 &\\ & Z\end{bmatrix}Q^T -\frac{\mathbf{11}^T}{n}\\ =& Q\begin{bmatrix}1 &\\ & Z\end{bmatrix}Q^T -Q\begin{bmatrix}1 &\\ & \mathbf{O}\end{bmatrix}Q^T\\ = & Q\begin{bmatrix}0 &\\ & Z\end{bmatrix}Q^T. \end{aligned}\]

The \(\lambda_{\max}(Z) = \vert\lambda_{2}\vert\), which completes the proof.


Therefore, the spectral gap of \(W = 1-\sigma\).



Indications of \(\sigma\)


Apparently, \(\sigma\) indicates the connectivity of the graph \(G\).

  1. If \(\sigma = 0\), the graph is fully connected.
  2. Smaller \(\sigma\) implies better connectivity.
  3. Using lazy Metropolis weights, the relationship between \(n\) and \(\frac{1}{1-\lambda}\) could be found in the Proposition 5, [3].

References

[1] Xiao, L., & Boyd, S. (2004). Fast linear iterations for distributed averaging. Systems & Control Letters, 53(1), 65-78.

[2] Javad Mashreghi (2017). Slides: https://www.birs.ca/workshops/2017/17w2668/files/Eigenvalues-Doubly-Stochastic-Matrices-javad%20-%202017%20-%20Banff.pdf.

[3] A. Nedić, A. Olshevsky and M. G. Rabbat, “Network Topology and Communication-Computation Tradeoffs in Decentralized Optimization,” in Proceedings of the IEEE, vol. 106, no. 5, pp. 953-976, May 2018, doi: 10.1109/JPROC.2018.2817461.