Aug-PDG:带不等式约束凸优化算法的线性收敛性

孟敏,李修贤

(同济大学电子与信息工程学院控制科学与工程系,上海 201804;同济大学上海自主智能无人系统科学中心,上海 201210;同济大学上海智能科学与技术中心,上海 200092)

This paper deals with the constrained optimization problem formulated as follows:

where the objective functionf: Rn →R andg(x)(g1(x)g2(x)...gm(x))Twithgi: Rn →R being convex and continuously differentiable.By resorting to the(or augmented)LagrangianL(x,λ)of problem(1),the corresponding(or augmented)primal-dual gradient algorithm(PDG)(or Aug-PDG)can be designed as

whereα ＞0 is a stepsize and[.]+denotes the projection operator onto the nonnegative orthants componentwisely.It is known that Eq.(2) can find a saddle point ofL(x,λ),and thus it has been extensively studied to solve the constrained optimization problem[1].

Optimization has wide applications in the artificial intelligence field such as smart grids [2-3],wireless communication[4],robot systems[5],game theory [6-7],to name just a few.To date,there is a large body of literature on theoretical analysis of asymptotic convergence of various algorithms,including primaldual gradient-based algorithms,for tackling the optimization problem under different settings[8-18].

In recent decades,researchers have focused on the linear convergence and exponential convergence of primal-dual gradient-based algorithms in discrete-time and continuous-time,respectively.It is well-known that when the objective function is strongly convex and smooth,the gradient decent algorithm for unconstrained convex optimization can achieve global exponential convergence in continuous-time and global linear convergence in discrete-time.In the context of constrained optimization with equality constraintsAxbor affine inequality constraintsAx≤b,PDG is proved to converge globally exponentially in continuous-time setup [19].A proximal gradient flow was proposed in [20],which can be applied to resolve convex optimization problems with affine inequality constraints and has global exponential convergence whenAhas full row rank.Local exponential convergence of the primal-dual gradient dynamics can be established with the help of spectral bounds of saddle matrices[21].Recently,the authors in [22] proved that the Aug-PDGD in continuous-time for optimization with affine equality and inequality constraints achieves global exponential convergence,and the global linear converge of primal-dual gradient optimization (PDGO) in discretetime was discussed in [23] by contraction theory.It should be noted that the aforementioned works focus on unconstrained optimization or constrained optimization with affine equality and/or affine inequality constraints.For the case with nonlinear inequality constraints,the asymptotic convergence has been extensively studied such as in [24].However,the linear/exponential convergence for the optimization with nonlinear inequality constraints is seldom investigated in the literature.One exception is the recent work [25],where the authors established a semi-global exponential convergence of continuous-time Aug-PDGD in the sense that the convergence rate depends on the distance from the initial point to the optimal point.

However,[25]concentrates on the continuous-time dynamics.As discrete-time algorithms are easily implemented in practical applications,in this paper,the discrete-time algorithm is addressed for the optimization problem with nonlinear inequality constraints.Theoretical analysis based on a quadratic Lyapunov function that has non-zero off-diagonal terms is first presented to show that the Aug-PDG achieves semi-global linear convergence.

The rest of this paper is organized as follows.Section 2 introduces preliminaries on optimization with nonlinear equality constraints.The main result on the semi-global linear convergence of Aug-PDGA,along with its proof,is presented in Section 3.Section 4 provides a numerical example to illustrate the feasibility of the obtained result.Section 5 makes a brief conclusion.

Notations.Let Rm,and Rm×nbe the sets ofmdimensional real column vectors,m-dimensional nonnegative column vectors andm×nreal matrices,respectively.Define [x]+to be the component-wise projection of a vectorRmonto Rm+.x≥0 for any vectorRmmeans that each entry ofxis nonnegative.For an integern ＞0,denote[n] :{1,2,...,n}.Inis the identity matrix of dimensionn.1n(resp.0n)represents ann-dimensional vector with all of its elements being 1(resp.0).For a vector or matrixA,ATdenotes the transpose ofAandAIis a matrix composed of the rows ofAwith the indices inI.For real symmetric matricesPandQ,P ≻(≽,≻,≼)Qmeans thatP-Qis positive (positive semi-,negative,negative semi-) definite,while for two vectors/matricesw,vof the same dimension,w≤vmeans that each entry ofw -vis nonnegative.diag{a1,a2,...,an}represents a diagonal matrix withai,[n],on its diagonal.

Consider problem (1).An augmented Lagrangian associated with problem(1)is introduced as[26]

whereRn,λ(λ1λ2...λm)TRm,ρ ＞0 is the penalty parameter,and

It can be verified thatU(x,λ) is convex inxand concave inλ,andU(x,λ) is continuously differentiable,i.e.,

whereeiis ann-dimensional vector with theith entry being 1 and others 0.Then the Aug-PDG is explicitly written as

where(0,ρ] is the stepsize to be specified.Here,the initial conditions are arbitrarily chosen asx0Rnandλ0≥0.

To proceed,the following results are vital for solving the constrained optimization problem.

Lemma 1For Aug-PDG (7),ifλ0≥0,thenλk≥0,∀k≥0.

ProofThis result can be proved by mathematical induction,which is omitted here.

Lemma 2A primal-dual pair (x*,λ*) is an equilibrium point of the Aug-PDG(7)if and only if(x*,λ*)is a Karush-Kuhn-Tucker(KKT)point of(1).

ProofIf a primal-dual pair (x*,λ*) is an equilibrium point of the Aug-PDG (7),that is,x*x*-α∇xL(x*,λ*)andλ*λ*+α∇λL(x*,λ*),then∇xL(x*,λ*)0 and∇λL(x*,λ*)0.∇λL(x*,λ*)0 is equivalent to

Thus,it can be claimed that the primal-dual pair(x*,λ*)is a KKT point.

Conversely,if(x*,λ*)is a KKT point of(1),then

Via a simple computation,∇xL(x*,λ*)0 and∇λL(x*,λ*)0,which implies that (x*,λ*) is an equilibrium point of the Aug-PDG(7).

In this section,the main result on the linear convergence of the Aug-PDG is presented.

3.1 Convergence results

The following assumptions are essential for deriving the main result.

Assumption 1The problem (1) has a unique feasible solutionx*,and atx*,the linear independence constraint qualification (LICQ) holds atx*,i.e.,{∇gi(x*)is linearly independent,whereI:[m]|gi(x*)0}is the so-called active set atx*.

Under Assumption 1,the optimal Lagrangian multiplierλ*is also unique[27].Denote byJthe Jacobian ofg(x)atx*andJIthe matrix composed of the rows ofJwith the indices inI.LICQ in Assumption 1 also implies that0[25].Define

to be the smallest eigenvalue of

Assumption 2The objective functionf(x)has a quadratic gradient growth with parameterµ＞0 over Rn,i.e.,for anyRn,

The concept of quadratic gradient growth was introduced in [28],which is a relaxation of strong convexity condition for guaranteeing linear convergence of gradient-based optimization algorithms.In fact,the class of functions having quadratic gradient growth include the strongly convex functions as a proper subset and some functions with quadratic gradient growth are even not convex.

Assumption 3The objective functionfislsmooth over Rn,i.e.,

For any[m],gi(x) isLgi-smooth and has bounded gradient,i.e.,for someLgi,Bgi ＞0 and anyRn,there holds

Denote

Under Assumption 3,one can obtain that

Before giving the main result of this paper,it is convenient to list the following concept similar to that in continuous-time setting[29].

Definition 1Consider the dynamicsz(t+1)φ(z(t)) with initial pointz(0)z0.Assume thatzeis an equilibrium point satisfyingzeφ(ze).zeis said to be a semi-global linear stable point if for anyh ＞0,there existc ＞0 and 0＜γ ＜1 such that for anyz0satisfying‖z0-ze‖≤h,‖z(t)-ze‖≤cγt‖z0-ze‖,∀t≥0.zeis said to be a global linear stable point ifcandγdo not depend onh.

Then the main result is presented as follows.

Theorem 1Under Assumptions 1-3,if the stepsize 0＜α ＜1 is chosen such that

whereδ ＞0 satisfies

where 0＜γ ＜1 satisfies

Proof The proof is postponed to the next subsection.

Remark 1The selection of parametersαandδensures thatc1,c2,c3are positive,and thenγ ＞0 can be guaranteed.From Theorem 1,one can see that the convergence rate is related toπ*,where the distanced0between the initial point and the optimal one is involved.Therefore,the convergence rate decreases to 0 asd0goes to infinity.The rate also changes as(xk,λk)approaches the optimal point.In fact,any(xk,λk)can be regarded as an initial point of the studied algorithm in the sense of running the algorithm from the point (xk,λk).Then,when (xk,λk) approaches the optimal point,π*is smaller,leading to smaller 1-γ.As a result,the rate is slow at the beginning and then becomes fast whenxkgoes to the optimal point.Consequently,Theorem 1 does not guarantee the existence of a global linear convergence,and only semi-global linear convergence can be ensured.

Remark 2Compared with the most related literature [25],where a continuous-time algorithm,called Aug-PDGD,was studied with a semi-global exponential convergence,a discrete-time algorithm Aug-PDG is analyzed here with a semi-global linear convergence.Although discrete-time algorithms may be obtained by discretizing the continuoustime Aug-PDGD using such as explicit Euler method,it is unclear how to select the sampling stepsize to guarantee the convergence especially in the sense of semi-global convergence.In comparison,an explicit bound on the stepsizeαis established here.However,one drawback is that the upper bound ofαdepends on the bounds of the cost functions,constraint functions and their gradients,as well as the optimal solution.This may be tackled by adaptive methods,which will be our future research of interest.

3.2 Proof of Theorem 1

To prove Theorem 1,intermediate results are first presented as follows.

Lemma 3[22]For anyy,y*R,there exists[0,1]such that[y]+-[y*]+ξ(y-y*).Specifically,ξcan be chosen as

Lemma 4Under Assumption 1-3,xkgenerated by the Aug-PDG(7)satisfies

ProofBy iterations in(7),one has that

Based on

for the second term on the right side of(18),one has

Note that

Define

if(ρgi(xk)+λi,k)-(ρgi(x*)+λ*)0.Thenξi,j[0,1].It can be obtained from Lemma 3 that

Substituting Eq.(23)into Eq.(20)yields that

By Eqs.(19)-(24)and Assumption 3,one has that

For the third term on the right side of Eq.(18),

where the inequality is derived based on Assumption 2 and the convexity ofU(x,λ)atx,i.e.,

for anyx,x＇Rn.Plugging Eq.(25)and Eq.(26)into Eq.(18),one concludes that Eq.(17)holds.

Lemma 5Under Assumptions 1-3,λkgenerated by the Aug-PDG(7)satisfies

ProofFor‖λk+1-λ*‖2,by iteration (7b),one has

Recalling∇λL(x*,λ*)∇λU(x*,λ*)0 and the notation ofξi,kin Eqs.(21)-(22),it can be obtained that

whereΞkdiag{ξ1,k,...,ξm,k},the inequality is obtained based on Eq.(12) and‖Ξk‖≤1,‖Ξk -Im‖≤1 forξi,k[0,1],[m].

AsU(x,λ)is concave atλ,one has

By Eqs.(30)-(31),it can be derived from Eq.(29)that Eq.(28)holds.

In what follows,we prove Theorem 1 in detail.

Proof of Theorem 1Define

where

The bounds of‖xk+1-x*‖2andλk+1-λ*are given in Lemmas 4 and 5,respectively.It suffices to compute the bound of 2δ(xk+1-x*)TJT(λk+1-λ*).

Note that(x*,λ*)is the KKT point of(1),that is,

It is easy to obtain that

By Eq.(24),one has that

Similar to Eqs.(21)-(22),define

then

whereΞλ:diag{ξ1,λ,...,ξm,λ}.For the last term of Eq.(36),it holds that

Therefore,by Eqs.(30)and(37)-(41),Eq.(36)is rewritten as

where the last inequality is obtained by a simple computation,along with Eqs.(24),(30)and(38).

Define

Combining with Eqs.(17) (28) (42) and (44),the bound ofVδ,k+1can be derived that

Hence,to proveQ ≼0,it suffices to ensure

By Eq.(16),one can obtain that

where

Subsequently,by the mathematical induction,Eq.(52)can be proved.The proof is completed.

In this section,an example motivated by applications in power systems[25]is presented to illustrate the feasibility of the discrete-time Aug-PGD(7).Consider the following constrained optimization problem:

wherex(p1...pn q1...qn)Tandpv,i,Si,[n]are constants.The problem Eq.(53)along with an affine inequality constraint was considered in [25] but via a continuous-time dynamics Aug-PDGD.The affine inequality constraints can be regarded as special nonlinear constraints.Hence the algorithm Aug-PDG in this paper is applicable to the optimization problem Eq.(53).

Letn10,S(S1,...,Sn)(2.7,1.35,2.7,1.35,2.025,2.025,2.7,2.7,1.35,2.025) andpv(pv,1,...,pv,n)4S.Chooseρ0.1.Three cases are simulated,where the initial point (x0,λ0) is selected randomly such that the distance from the initial point (x0,λ0) to the optimal point (x*,λ*) (i.e.,d0) is 0.1‖(x*,λ*)‖,5‖(x*,λ*)‖and 10‖(x*,λ*)‖,respectively.The curves of the normalized distancewith respect to the iterationkare shown in Fig.1 when choosingα0.1 and in Fig.2 when choosingα0.05,where for each case,10 instances of randomly selected initial points are considered.From Fig.1,it can be seen that the convergence rates are different for differentd0,and the distance‖(xk-x*,λk-λ*)‖linearly decays on the whole.From Figs.1 and 2,when the algorithm is convergent by choosing appropriate stepsizeα,it can be seen that the convergence rate is smaller ifαis smaller.Moreover,for each case,the decreasing rate also changes as (xk,λk) approaches the optimal point.Specifically,the decreasing rates are small at the beginning and then become large when(xk,λk)goes to the optimal point.These observations support the semi-global linear convergence of the Aug-PDG,which is consistent with our theory analysis.

Fig.1 Simulation of the relative distances to‖(x*,λ*)‖with respect to iteration k by choosing α0.1

Fig.2 Simulation of the relative distances to‖(x*,λ*)‖with respect to iteration k by choosing α0.05

In this paper,the linear convergence of an Aug-PDG in discrete-time for convex optimization with nonlinear inequality constraints has been investigated.Under some mild assumptions,the Aug-PDG has been proved to semi-globally converge at a linear rate,which depends on the distance from the initial point to the optimal point.Future research of interest may be to adopted adaptive methods to determine the upper bound of stepsizes.

猜你喜欢工程系同济大学收敛性《同济大学学报(医学版)》介绍同济大学学报（医学版）(2022年3期)2022-07-19Lp-混合阵列的Lr收敛性数学物理学报(2020年3期)2020-07-27《同济大学学报(医学版)》介绍同济大学学报（医学版）(2020年3期)2020-06-26WOD随机变量序列的完全收敛性和矩完全收敛性数学物理学报(2019年5期)2019-11-29《同济大学学报（自然科学版）》征稿启事同济大学学报（自然科学版）(2019年9期)2019-10-12同济大学医学院介绍同济大学学报（医学版）(2019年4期)2019-09-12聚焦“教室照明”照明工程学报(2019年3期)2019-07-09关爱老人，从我做起”志愿活动">电子信息工程系开展“送温暖，献爱心；
关爱老人，从我做起”志愿活动山西经济管理干部学院学报(2019年4期)2019-02-11END随机变量序列Sung型加权和的矩完全收敛性Acta Mathematica Scientia(English Series)(2018年6期)2018-03-01电子信息工程系黑龙江民族职业学院信息(2017年6期)2018-01-03

推荐访问:不等式线性算法