Article 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

SoftwareX 14 (2021) 100690

Contents lists available at ScienceDirect

SoftwareX
journal homepage: www.elsevier.com/locate/softx

Original software publication

DynaProg: Deterministic Dynamic Programming solver for finite


horizon multi-stage decision problems

Federico Miretti , Daniela Misul, Ezio Spessa
IC Engines Advanced Laboratory, Dipartimento Energia, Politecnico di Torino, c.so Duca degli Abruzzi 24, 10129 Torino, Italy

article info a b s t r a c t

Article history: DynaProg is an open-source MATLAB toolbox for solving multi-stage deterministic optimal decision
Received 18 February 2021 problems using Dynamic Programming. This class of optimal control problems can be solved with
Received in revised form 23 March 2021 Dynamic Programming (DP), which is a well-established optimal control technique suited for highly
Accepted 23 March 2021
non-linear dynamic systems. Unfortunately, the numerical implementation of Dynamic Programming
Keywords: can be challenging and time consuming, which may discourage researchers from adopting it. The
Dynamic Programming toolbox addresses these issues by providing a numerically fast DP optimization engine wrapped in
Optimal control a simple interface that allows the user to set up an optimal control problem in a straightforward yet
Decision problem flexible environment, with no restrictions on the controlled system’s simulation model. Therefore, it
enables researchers to easily explore the usage of Dynamic Programming in their fields of expertise.
Thorough documentation and a set of step-by-step examples complete the toolbox, thus allowing for
easy deployment and providing insight of the optimization engine. Finally, the source code’s class-
oriented design allows researchers experienced in Dynamic Programming to extend the toolbox if
needed.
© 2021 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND
license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Code metadata
Current code version v1.1
Permanent link to code/repository used for this code version https://github.com/ElsevierSoftwareX/SOFTX-D-21-00036
Code Ocean compute capsule –
Legal Code License MIT license
Code versioning system used git
Software code languages, tools, and services used MATLAB
Compilation requirements, operating environments & dependencies MATLAB R2020a or newer
If available Link to developer documentation/manual Documentation is included in the software package
Support email for questions [email protected]

Software metadata
Current software version v1.1
Permanent link to executables of this version https://github.com/fmiretti/DynaProg
Legal Software License MIT license
Computing platforms/Operating Systems Linux, OS X, Microsoft Windows
Installation requirements & dependencies MATLAB R2020a or newer
If available, link to user manual — if formally published Documentation is included in the software package
include a reference to the publication in the reference list
Support email for questions [email protected]

1. Motivation and significance


∗ Corresponding author.
E-mail addresses: [email protected] (Federico Miretti), Multi-stage optimal decision problems describe a wide class
[email protected] (Daniela Misul), [email protected] (Ezio Spessa). of control problems where decisions must be made in stages in

https://doi.org/10.1016/j.softx.2021.100690
2352-7110/© 2021 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-
nc-nd/4.0/).
Federico Miretti, Daniela Misul and Ezio Spessa SoftwareX 14 (2021) 100690

order to minimize a certain total cost. As the stages progress, the where k indicates the current stage, N is the number of stages (the
system evolves based on its own dynamics which are influenced control horizon), x is the state of the system, and u is the control
by the decisions themselves. If the system’s evolution for each variable (the decision to be taken). Both x and u can be scalar or
stage is fully predictable, the problem is deterministic. Instead, if vector, if the system is characterized by several state variables or
the system’s evolution is influenced by some random phenomena, several control variables must be controlled. The stage cost takes
the problem is stochastic. Moreover, the decision problem may the form g(xk , uk ), so that the total cost incurred in the entire
be defined over a finite horizon or an infinite horizon, based on process is
whether the number of stages is finite or not. N −1
Dynamic Programming (DP) is a technique that is applied to ∑
J(x0 , u0 , . . . , uN −1 ) = gN (xN ) + g(xk , uk ), (2)
the very wide field of optimal control in multi-stage decision
k=0
problems [1]. Its implementation however is not straightforward:
even in a deterministic scenario, its original form (Exact Dynamic where gN (xN ) is a terminal cost which may be incurred based on
Programming) can only be applied to a very limited number of the final state of the system.
cases because of its computational complexity. In most cases at The main output of DynaProg are the optimal value of the total
least one technique from the broad field of Approximate Dynamic cost incurred in the entire decision problem
Programming must be adopted to obtain a numerical solution to
the optimization problem (see e.g. [2]), mitigating the so-called J ∗ (x0 ) = min J(x0 , u0 , . . . , uN −1 ) (3)
uk ∈Uk (xk )
curse of dimensionality (the issue of computational complexity k=0,...,N −1

easily exploding as the dimensionality of the problem increases) and the optimal control sequence u∗0 , . . . , u∗N −1 , that is the se-
which affects all DP-based algorithms. Because of this, building quence of control variables that minimizes the total cost, subject
an algorithm based on Dynamic Programming often involves to the constraint that each u∗k must belong to the set of admissible
tailoring it to a specific optimization problem (see e.g. the com- control variables at stage k, i.e. Uk (xk ).
prehensive set of methods and applications discussed in [3]) and The core of DynaProg is a deterministic Dynamic Programming
requires both technical knowledge of the problem at hand and a optimization algorithm, which is divided in a backward phase and
deep understanding of DP and its implementation challenges. a forward phase.
DynaProg is a general-purpose MATLAB software package that In the backward phase, the algorithm iteratively builds the
allows to solve deterministic, finite horizon multi-stage decision optimal cost-to-go for each stage:
problems with a user-friendly interface. The rationale behind its
Jk∗ (xk ) = gk (xk , uk ) + Jk∗+1 (fk (xk , uk )) .
( )
development is to provide a simple interface that allows the user min (4)
uk ∈Uk (xk )
to define an optimization problem of this class in a straightfor-
ward manner, without having to deal with the implementation of The optimal cost-to-go function for each stage is the optimal cost
the optimization algorithm itself. Moreover, a fundamental aim associated to the tail sub-problem which involves only the stages
of DynaProg is to provide a computationally fast optimization from that to the last. In other words, the cost-to-go function Jk∗ (xk )
algorithm, as computational time is one of the main factors that is the minimum cost incurred if the system must evolve from
limits the usage of Dynamic Programming algorithms. stage k to stage N, expressed as a function of the initial state xk .
This particular class of problems is found in many engineer- In general, it may not be possible to obtain an analytical
ing and economic applications, and Dynamic Programming al- expression for Jk∗ (xk ), which would be required for a solution
gorithms have been used with great benefits by many research based on Exact Dynamic Programming. For this reason, DynaProg
communities studying various subjects such as vehicle routing requires the user to define discrete computational grids for the
problems [4], optimal control of hybrid-electric vehicles [5,6], state and control variables. Then, for each iteration k during the
battery health management [7], resource allocation problems [8], backward phase, it evaluates gk (xk , uk ) + Jk∗+1 (fk (xk , uk )) for all
reservoir systems operation [9,10], hydrothermal coordination in points belonging to the computational grids and it constructs a
power systems [11], sewer network management [12], and even numerical approximation of Jk∗ (xk ) by linear interpolation.
natural ecosystems preservation [13]. In the backward phase, the system’s evolution is simulated
Indeed, speed is DynaProg’s main strength. The software was starting from the initial state x0 . For each stage, the optimal
developed with state-of-the-art algorithms and it relies on fea- control variables are determined as
tures (e.g. vectorization, implicit expansion) and best coding prac-
u∗k (xk ) = argmin gk (xk , uk ) + Jk∗+1 (fk (xk , uk ))
( )
tices (e.g. argument validation) which allow to exploit MAT- (5)
uk ∈Uk (xk )
LAB’s computational engine at its best as well as improve user-
friendliness, based on our experience in the practical implemen- and the simulation is advanced to the next stage, until the last
tation of Dynamic Programming algorithms. Another prominent stage is reached.
feature of the software is flexibility and ease of use. The user
can define the problem structure with a simple and consistent 2.1. Software architecture
naming structure, and he/she can define the system’s dynamics
with a simple MATLAB function (m-file [14]). Furthermore, ad- The interface by which the user is able to define and solve
vanced users can take advantage of DynaProg’s well-documented, a multi-stage deterministic optimal decision problem is the Dy-
object-oriented design to fully customize and reuse the source naProg class. In order to set up the problem and its settings, an
code. instance of the class can be created by calling the class construc-
tor:
2. Software description
prob = DynaProg(__);
The core of the software is the DynaProg class, which allows
The input arguments of the class constructor are used to define
to store the problem structure with its properties and solve it
some mandatory arguments such as the dynamic system to be
with its methods. The basic problem elements are a model for the
controlled, the computational grids mentioned in Sections 2 and
system dynamics, a stage cost, and the initial state of the system.
2.2.2 and initial state values, as well as other optional arguments.
The state dynamics is a model which takes the form
The dynamic system in particular is passed to the class construc-
xk+1 = f (xk , uk ), k = 0, 1, . . . , N − 1, (1) tor as a function handle which points to a function contained in
2
Federico Miretti, Daniela Misul and Ezio Spessa SoftwareX 14 (2021) 100690

an external m-file. This allows the user to code the dynamics of


the system with the greatest amount of flexibility.
The problem object is returned as the output of the class
constructor, and the settings defined by the user are stored as
properties of that object. The problem structure can then be
modified, if needed, by changing the values of the appropriate
properties.

2.2. Software functionalities

2.2.1. Defining the system’s dynamics Fig. 1. Scheme of a p2 parallel HEV powertrain.
The user can define the system’s dynamics and stage cost by
writing a function in an m-file, with the following structure.
[x_next, stage_cost] = myfun(x, u) 3. Illustrative example: optimal control of an hybrid electric
vehicle
Then, the model function is passed to the DynaProg class con-
structor as a function handle. 3.1. Problem definition
problem = DynaProg(__, @myfun)
Consider a p2 parallel Hybrid Electric Vehicle (HEV) power-
If the model function is not time-invariant or, in other words,
train schematized in Fig. 1.
it is also a function of some time-dependent exogenous input
Assume that the vehicle must drive following a speed trace in
w, the system’s dynamics and stage cost are specified with the
time, such as the speed trace defined by a regulatory driving cy-
following alternate structure.
cle, while minimizing the fuel consumption in order to maximize
[x_next, stage_cost] = myfun(x, u, w) the benefits of hybridization.
The objective of this example is to design a control strategy
The exogenous inputs w can include all variables that do not
which defines the gear number to be engaged by the gearbox
depend on the state and control variables, but may depend on
and the torque that the electrical machine (EM) must provide or
time. It may also be simply time itself.
absorb. Furthermore, we must make sure that the battery’s state
For some problems, the user may want to define constraints on
of charge at the end of the drive cycle is equal to its initial value.
the state and control variables that can be reached or selected by
The vehicle speed and acceleration in time is treated as an
the system. To do this, DynaProg allows to define a third output to
the model function which define, in the form of a logical variable, exogenous input, while the control variables are the gear number
unfeasible states and/or control variables. and the EM torque ratio, defined as:
TEM
[x_next, stage_cost, unfeas] = myfun(x, u, w) τ= , −1 ≤ τ ≤ 1 (6)
Treq

2.2.2. Defining the computational grids where TEM is the torque provided (if positive) or absorbed (if neg-
DynaProg can be used to solve multi-stage optimal decision ative) by the electrical machine and Treq is the torque requested
problems whether the state and control variables are discrete or at the powertrain level to make the vehicle following the given
continuous. However, the user must specify a discrete computa- speed trace. As such, Treq is a function of the vehicle’s speed and
tional grid for each of the state and control variables, though for the gear number.
different reasons. Since we must set a constraint on the terminal value of the
Control variables must be discretized and the resulting optimal battery’s state of charge (SOC), we must be able to assign an initial
control state trajectory will be restricted to the discrete grid condition to it and track its evolution in time (in other words,
specified by the user. This simplifies the numerical solution as the throughout the decision problem’s stages). Thus, the battery SOC
min and argmin operations mentioned in Eqs. (4) and (5) simplify is set as a state variable, and we must define its dynamics.
to finding minimum values over finite sets. The SOC dynamics is derived using a simple battery internal
State variables, on the other hand, do not get discretized in the resistance model:
forward simulation. However, building the cost-to-go function √
2
in the backward phase (as mentioned in Section 2) requires VOC − VOC − 4Req Pbatt
sampling it at a certain number of values for the state variables. Ibatt = , (7)
2Req
For this reason, a discretized computational grid for the state
Ibatt
variables is needed. The discretization of this computational grid ˙ =
SOC . (8)
for the state variables affects the computation of the cost-to-go Cbatt
function and, as such, it is a source of sub-optimality. Selecting Here Ibatt , VOC , Req and Cbatt are the battery current, open-circuit
the proper discretization level for the state variables grid usually voltage, equivalent resistance and capacity.
requires some understanding of the physics behind the system The battery power Pbatt is evaluated as
under analysis and possibly some trial-and-error. {
Both computational grids are entirely user-defined. The user ηinv PEM if PEM ≥ 0,
Pbatt = (9)
can therefore decide whether to adopt uniform grids or exper-
η
1
PEM if PEM < 0,
inv
iment with non-uniform grids, which may allow to reduce the
grid size but may also bias the solution if not properly designed. and the power absorbed or generated by the electrical machine
However, DynaProg does not currently include any functionality PEM is evaluated as
to automatically design a non-uniform computational grid and it {
is left to the user to do so based on his/her field expertise in the ηEM ωEM TEM if TEM ωEM ≥ 0,
PEM = (10)
optimization problem he/she is studying. η
1
ωEM TEM if TEM ωEM < 0,
EM

3
Federico Miretti, Daniela Misul and Ezio Spessa SoftwareX 14 (2021) 100690

with ηEM and ηinv being the efficiencies of the electrical machine Listing 3: Model function signature modified to return additional
and the inverter (or any other power electronics). outputs.
Finally, the engine’s fuel consumption is evaluated based on function [x_new, stageCost, unfeas, engTrq, emTrq] = hev(x, u
the engine speed and torque with its fuel consumption map , w, veh, fd, gb, eng, em, batt)
ṁfuel (ωeng , Teng ). ...
In order for the optimization problem to be meaningful, we end
must set a series of constraints that reflect the operational con-
straints of an actual powertrain. The model function must perform all operations required to
The engine’s torque cannot exceed its limit torque (which is evaluate the updated state variables, the stage cost and to define
in turn dependent on its speed): constraints via the unfeasibility tensor. The full code for this
example is included in the software documentation and can be
Teng = (1 − τ ) Treq ≤ Teng ,lim (ωeng ) (11) accessed by entering open('hev') in MATLAB’s command window.
The electrical machine’s torque must stay within its limit
torque in generation and motor mode: 3.3. Setting up and solving the optimization problem

TEM ,gen,lim ≤ TEM ≤ TEM ,mot ,lim (12) The optimization problem must be set up and solved in a
separate script. The script must define the discrete computational
When braking (i.e. Treq < 0), the electrical machine should
grids for state and control variables as well as initial conditions
never operate in motor mode.
and (optionally) terminal constraints for the state variable. The
The terminal SOC must be equal to its initial value:
optimization settings for this example are shown in Listing 4.
SOC0 = SOCN (13)
Listing 4: Preparing the optimization problem settings.
Also, the battery is characterized by a maximum discharge
% State variable grid
power and a maximum charge current that must not be ex-
SVnames = SOC;
ceeded:
x_grid = {0.4:0.005:0.7};
Pbatt ≤ Pbatt ,max (14) % Initial state
x_init = {0.6};
Ibatt ≥ Ibatt ,chrg ,lim (15) % Final state constraints
x_final = {[0.6 0.6]};
3.2. Defining the model function
% Control variable grid
First, the model function must be created by the user. In CVnames = [Gear Number, Torque split];
this example, the vehicle speed and acceleration are treated as u1_grid = [1 2 3 4 5];
exogenous inputs. Therefore, the basic function signature accepts u2_grid = 1:0.1:1;
three inputs (the state variables, the control variables, and the u_grid = {u1_grid, u2_grid};
exogenous inputs) and three outputs (the updated state variables,
the stage cost, and the unfeasibility tensor which is used to set the % Load a drive cycle
model constraints). load UDDS % contains velocity and time vectors
dt = time_s(2) time_s(1);
% Create exogenous input
Listing 1: Basic model function signature.
w{1} = speed_kmh./3.6;
function [x_new, stageCost, unfeas] = hev(x, u, w) w{2} = [diff(w{1})/dt; 0];
...
end % Number of stages (time intervals)
Nint = length(time_s);
Also, for practical purposes, it is convenient to define all con-
stant parameters that characterize the powertrain (such as the
% Generate and store vehicle data
engine limit torque characteristic Teng ,lim (ωeng ) and fuel consump-
[veh, fd, gb, eng, em, batt] = data();
tion map ṁfuel (ωeng , Teng ), the electrical machine efficiency map
ηEM (ωEM , TEM ), the gearbox speed ratios, etc.) outside of the model All the information that is needed to define the optimization
function and pass them to it as additional inputs. In this example, problem is then used to create a problem structure, that is an
the vehicle data is stored in six structures (veh, fd, gb, eng, em instance of the DynaProg class, as shown in Listing 5. Note how
and batt). the vehicle speed and acceleration are passed to the constructor
as exogenous inputs.
Listing 2: Model function signature modified to accept additional
inputs. Listing 5: Constructing the problem structure.
function [x_new, stageCost, unfeas] = hev(x, u, w, veh, fd, prob = DynaProg(x_grid, x_init, x_final, u_grid, Nint, @(x, u
gb, eng, em, batt) , w) hev(x, u, w, veh, fd, gb, eng, em, batt), '
... ExogenousInput', w);
end
The optimization problem can then be solved by using the run
Finally, it might be interesting to also include in the optimiza- method, which returns the problem structure with simulation
tion results the time profiles of physical quantities other than results. The same problem object can then be passed to the plot
the state variables, control variables and cost. This can be done method to visualize the optimal state variables, control variables
by changing the function signature to return additional outputs and cumulative cost trajectories (see Fig. 2), as shown in Listing
starting from the fourth positional output. 6.
4
Federico Miretti, Daniela Misul and Ezio Spessa SoftwareX 14 (2021) 100690

Fig. 2. Optimization results for the p2 HEV example.

Listing 6: Running the optimization problem. 5. Conclusions


prob = run(prob);
plot(prob) DynaProg was designed to provide a reliable, versatile and
well documented tool in multi-stage deterministic optimal de-
cision problems. Its development stems from the authors’ ex-
4. Impact perience in optimal control of Hybrid-Electric Vehicles, but the
tool and the documentation was specifically designed to make it
There are many fields where Dynamic Programming is already problem-independent.
a well-established method for solving the class of decision prob- We hope that this will enable many researchers facing this
lems such as the ones listed in Section 1. For researchers in these class of optimization problems to exploit Dynamic Programming
fields, DynaProg may speed up research activity by allowing them in their research without having to invest a long time in becoming
to set up their own optimization problem and solve it with a fast experienced in this technique.
algorithm without having to deal with the implementation chal- By making the code fully open source, we also hope that
lenges of Dynamic Programming. For the same reasons, DynaProg those researchers who are instead experienced in Dynamic Pro-
may also enable researchers to develop and experiment with new gramming will be able to contribute with their own extensions
methods even in research fields where Dynamic Programming or improvements. Examples of potential extensions include re-
has not been previously used. placing the functional approximator for the cost-to-go function
Currently, DynaProg is being used by sustainable mobility mentioned in Section 2 with more sophisticated solutions (such
groups at Politecnico di Torino to investigate optimal control as the ones mentioned in [15]) or embedding tools for assist-
strategies for hybrid-electric vehicles and optimal design of ed/automated creation of non-uniform computational grids.
hybrid-electric powertrains, and it is also under evaluation at
other institutions. In this context, DynaProg has already proven
Declaration of competing interest
its worth by enabling relatively complex models to be explored
with significantly reduced computational times.
The authors declare that they have no known competing finan-
The DynaProg package provides an easy, flexible, well-docum-
cial interests or personal relationships that could have appeared
ented and computationally fast tool that allows researchers to
to influence the work reported in this paper.
obtain the (approximate) global solution for any finite horizon,
multi-stage deterministic optimal decision problems, regardless
References
of the field of application. The package syntax and documentation
was carefully designed to prevent it from being tied to a specific
[1] Bertsekas DP. Dynamic programming and optimal control, vol. 1. 4th ed.
research topic. The authors hope that these features should stim- Belmont, Massachusetts: Athena Scientific; 2016.
ulate the adoption of Dynamic Programming-based optimization [2] Bertsekas DP. Dynamic programming and optimal control, vol 2. 4th ed.
in new research fields. Belmont, Massachusetts: Athena Scientific; 2012.
Moreover, DynaProg’s code was specifically designed with [3] Brandimarte P. From shortest paths to reinforcement learning: a
MATLAB-based tutorial on dynamic programming. 2021.
transparency and reusability in mind. One of the main design [4] Novoa C, Storer R. An approximate dynamic programming approach for
goals was allowing to extend and customize both its user inter- the vehicle routing problem with stochastic demands. European J Oper
face and computational core. The reason for this is that this will Res 2009;196(2):509–15. http://dx.doi.org/10.1016/j.ejor.2008.03.023.
allow researchers who work in fields where Dynamic Program- [5] Pérez LV, Bossio GR, Moitre D, García GO. Optimization of power
ming optimization algorithms are already an established practice management in an hybrid electric vehicle using dynamic programming.
In: Applied and computational mathematics - selected papers of the
to improve their understanding of their computational hazards fifth PanAmerican workshop - June 21–25, 2004, Tegucigalpa, Honduras.
and even develop their own optimization algorithm tailored to Math Comput Simulation 2006;73(1):244–54. http://dx.doi.org/10.1016/j.
their own research objectives. matcom.2006.06.016.

5
Federico Miretti, Daniela Misul and Ezio Spessa SoftwareX 14 (2021) 100690

[6] Brahma A, Guezennec Y, Rizzoni G. Optimal energy management in [11] Yang Jin-Shyr, Chen Nanming. Short term hydrothermal coordina-
series hybrid electric vehicles. In: Proceedings of the 2000 American tion using multi-pass dynamic programming. IEEE Trans Power Syst
control conference, vol. 1. 2000, p. 60–4. http://dx.doi.org/10.1109/ACC. 1989;4(3):1050–6. http://dx.doi.org/10.1109/59.32598.
2000.878772. [12] Abraham DM, Wirahadikusumah R, Short TJ, Shahbahrami S.
[7] Moura SJ, Forman JC, Bashash S, Stein JL, Fathy HK. Optimal control of Optimization modeling for sewer network management. J Constr
film growth in lithium-ion battery packs via relay switches. IEEE Trans Ind Eng Manag 1998;124(5):402–10. http://dx.doi.org/10.1061/(ASCE)0733-
Electron 2011;58(8):3555–66. http://dx.doi.org/10.1109/TIE.2010.2087294. 9364(1998)124:5(402).
[8] Forootani A, Iervolino R, Tipaldi M, Neilson J. Approximate dynamic pro- [13] Odom DIS, Cacho OJ, Sinden JA, Griffith GR. Policies for the management of
gramming for stochastic resource allocation problems. IEEE/CAA J Autom weeds in natural ecosystems: the case of scotch broom (Cytisus scoparius,
Sin 2020;7(4):975–90. http://dx.doi.org/10.1109/JAS.2020.1003231. L.) in an Australian national park. Ecol Econom 2003;17.
[9] Rani D, Moreira MM. Simulation–optimization modeling: A survey and [14] The Mathworks Inc. MATLAB primer (user’s guide). 2020.
potential application in reservoir systems operation. Water Resour Manag [15] Gaggero M, Gnecco G, Sanguineti M. Dynamic programming and value-
2010;24(6):1107–38. http://dx.doi.org/10.1007/s11269-009-9488-0.
function approximation in sequential decision problems: Error analysis
[10] Zhang Z, Zhang S, Wang Y, Jiang Y, Wang H. Use of parallel deterministic
and numerical results. J Optim Theory Appl 2013;156(2):380–416. http:
dynamic programming and hierarchical adaptive genetic algorithm for
//dx.doi.org/10.1007/s10957-012-0118-2.
reservoir operation optimization. Comput Ind Eng 2013;65(2):310–21.
http://dx.doi.org/10.1016/j.cie.2013.02.003.

You might also like