Method of Simulated Moments Estimation (MSM) — EKW lectures documentation (2025)

This lecture heavily relies on respy’s method of simulated moments (MSM) interface. A guide to using the interface can be found here.

Table of Contents

1Introduction
2Observed Data
3Data Moments
4Weighting Matrix
5Criterion Function
6Estimation Exercise
- 6.1Estimation for one parameter
  - 6.1.1Simulated Moments
  - 6.1.2Optimization Procedure
    - 6.1.2.1Upper and Lower Bounds
    - 6.1.2.2Constraints
    - 6.1.2.3Optimize
- 6.2Chatter in the Criterion Function
  - 6.2.1Changing the Simulation Seed
  - 6.2.2Multiple Simulation Seeds
  - 6.2.3Changing the Simulation Seed for increasing Sample Size of Simulated Agents
  - 6.2.4Changing the Simulation Seed for increasing Sample Size of Simulated and Observed Agents
- 6.3Moving away from the Optimum
  - 6.3.1Retrieving the true Parameter Vector
- 6.4Derivative-Based Optimization Algorithm
7References

Introduction

The method of simulated moments approach to estimating model parameters is to minimize a certain distance between observed moments and simulated moments with respect to the parameters that generate the simulated model.

Denote \(X\) our observed data and \(m(X)\) the vector of observed moments. To construct the criterion function, we use the parameter vector \(\theta\) to simulate data from the model \(\hat{X}\). We can then calculate the simulated moments \(m(\hat{X}| \theta)\).

The criterion function is then given by

\begin{equation}\psi(\theta) = (m(X) - m(\hat{X}| \theta))'\Omega(m(X) - m(\hat{X}| \theta))\end{equation}

where the difference between observed and simulated moments \((m(X) - m(\hat{X}| \theta))\) constitutes a vector of the dimension \(1\times M\) with \(1,..,M\) denoting the number of moments. The \(M\times M\) weighting matrix is given by \(\Omega\).

The MSM estimator is defined as the solution to

\begin{equation}\hat{\theta}(\Omega) = \underset{\theta}{\operatorname{argmin}} \psi(\theta).\end{equation}

The criterion function is thus a strictly positive scalar and the estimator depends on the choice of moments \(m\) and the weighting matrix \(\Omega\). The weighting matrix applies some scaling for discrepancies between the observed and simulated moments. If we use the identity matrix, each moment is given equal weight and the criterion function reduces to the sum of squared moment deviations.

Aside from the choice of moments and weighting matrix, some other important choices that influence the the estimation are the simulator itself and the algorithm and specifications for the optimization procedure. Many explanations for simulated method of moments estimation also feature the number of simulations as a factor that is to be determined for estimation. We can ignore this factor for now since we are working with a large simulated dataset.

In the following we will set up the data, moments and weighting matrix needed to construct the criterion function and subsequently try to estimating the parameters of the model using this MSM setup.

Observed Data

We generate our model and data using respy and a simple Robinson Crusoe model. In this model, the agent, Robinson Crusoe, in each time period decides between two choice options: working (i.e. going fishing) or spending time in the hammock.

We can use respy to simulate the data for this exercise.

[2]:

params_true, options = load_model_specs()

[3]:

# Generate observed data from model.simulate = rp.get_simulate_func(params_true, options)data_obs = simulate(params_true)

Let’s take a look at the model specifications.

[4]:

params_true

[4]:

		value
category	name
delta	delta	0.950
wage_fishing	exp_fishing	0.070
nonpec_fishing	constant	-0.100
nonpec_hammock	constant	1.046
shocks_sdcorr	sd_fishing	0.010
	sd_hammock	0.010
	corr_hammock_fishing	0.000

[5]:

options

[5]:

{'estimation_draws': 100, 'estimation_seed': 100, 'estimation_tau': 0.001, 'interpolation_points': -1, 'n_periods': 5, 'simulation_agents': 1000, 'simulation_seed': 132, 'solution_draws': 100, 'solution_seed': 456, 'covariates': {'constant': '1'}}

[6]:

data_obs.head(10)

[6]:

		Experience_Fishing	Shock_Reward_Fishing	Meas_Error_Wage_Fishing	Shock_Reward_Hammock	Meas_Error_Wage_Hammock	Dense_Key	Core_Index	Choice	Wage	Discount_Rate	...	Nonpecuniary_Reward_Fishing	Wage_Fishing	Flow_Utility_Fishing	Value_Function_Fishing	Continuation_Value_Fishing	Nonpecuniary_Reward_Hammock	Wage_Hammock	Flow_Utility_Hammock	Value_Function_Hammock	Continuation_Value_Hammock
Identifier	Period
0	0	0	0.999650	1	0.000410	1	0	0	fishing	0.999650	0.95	...	-0.1	0.999650	0.899650	4.739374	4.041815	1.046	NaN	1.046410	4.732282	3.879866
	1	1	1.000743	1	0.015065	1	1	1	fishing	1.073305	0.95	...	-0.1	1.073305	0.973305	4.042189	3.230405	1.046	NaN	1.061065	3.905840	2.994500
	2	2	0.996461	1	0.011853	1	2	2	fishing	1.146203	0.95	...	-0.1	1.146203	1.046203	3.224430	2.292871	1.046	NaN	1.057853	3.076295	2.124675
	3	3	0.998907	1	-0.007859	1	3	3	fishing	1.232329	0.95	...	-0.1	1.232329	1.132329	2.292998	1.221756	1.046	NaN	1.038141	2.113919	1.132398
	4	4	0.989419	1	0.012452	1	4	4	fishing	1.309130	0.95	...	-0.1	1.309130	1.209130	1.209130	0.000000	1.046	NaN	1.058452	1.058452	0.000000
1	0	0	0.992894	1	0.014190	1	0	0	hammock	NaN	0.95	...	-0.1	0.992894	0.892894	4.732618	4.041815	1.046	NaN	1.060190	4.746062	3.879866
	1	0	0.995369	1	-0.003848	1	1	0	hammock	NaN	0.95	...	-0.1	0.995369	0.895369	3.740144	2.994500	1.046	NaN	1.042152	3.876343	2.983359
	2	0	1.000668	1	-0.008441	1	2	0	hammock	NaN	0.95	...	-0.1	1.000668	0.900668	2.837176	2.038430	1.046	NaN	1.037559	2.974068	2.038430
	3	0	0.992542	1	-0.013488	1	3	0	hammock	NaN	0.95	...	-0.1	0.992542	0.892542	1.885496	1.045215	1.046	NaN	1.032512	2.025466	1.045215
	4	0	0.998695	1	-0.005397	1	4	0	hammock	NaN	0.95	...	-0.1	0.998695	0.898695	0.898695	0.000000	1.046	NaN	1.040603	1.040603	0.000000

10 rows × 21 columns

Data Moments

For the setup of the estimation we first have to choose a set of moments that we will use to match the observed data and the simulated model. For this model we include two sets of moments:

The first set are Robinson’s choice frequencies (choice frequencies here refers to the share of agents that have chosen a specific option) for each period.
The second set are moments that characterize the wage distribution for each period, i.e. the mean of the wage of all agents that have chosen fishing in a given period and the standard deviation of the wages.

We need a functions that compute the set of moments on the observed and simulated data.

[7]:

def calc_choice_frequencies(df): """Calculate choice frequencies""" return df.groupby("Period").Choice.value_counts(normalize=True).unstack()

[8]:

def calc_wage_distribution(df): """Calculate wage distribution.""" return df.groupby(["Period"])["Wage"].describe()[["mean", "std"]]

[9]:

# Save functions in a dictionary to pass to the criterion function later on.calc_moments = { "Choice Frequencies": calc_choice_frequencies, "Wage Distribution": calc_wage_distribution,}

Respy’s MSM interface additionally requires us to specify, how missings in the computed moments should be handled. Missing moments for instance can arise, when certain choice options aren’t picked at all for some periods. We just specify a function that replaces missings with 0.

[10]:

def replace_nans(df): """Replace missing values in data.""" return df.fillna(0)

Now we are ready to calculate the moments.

[11]:

moments_obs = { "Choice Frequencies": replace_nans(calc_moments["Choice Frequencies"](data_obs)), "Wage Distribution": replace_nans(calc_moments["Wage Distribution"](data_obs)),}

[12]:

print("Choice Frequencies")print(moments_obs["Choice Frequencies"])print("\n Wage Distribution")print(moments_obs["Wage Distribution"])

Choice Frequencies fishing hammockPeriod0 0.698 0.3021 0.698 0.3022 0.698 0.3023 0.698 0.3024 0.698 0.302 Wage Distribution mean stdPeriod0 1.003189 0.0086401 1.072815 0.0107372 1.150106 0.0123163 1.234041 0.0124064 1.322711 0.013473

Weighting Matrix

Next we specify a weighting matrix. It needs to be a square matrix with the same number of diagonal elements as there are moments. One option would be to use the identity matrix, but we use a weighting matrix that adjusts for the variance of each moment to improve the estimation process. The variances for the moments are constructed using a bootstrapping procedure.

[13]:

def get_weighting_matrix(data, calc_moments, num_boots, num_agents_msm): """ Compute weighting matrix for estimation with MSM.""" # Seed for reproducibility. np.random.seed(123) index_base = data.index.get_level_values("Identifier").unique() # Create bootstrapped moments. moments_sample = list() for _ in range(num_boots): ids_boot = np.random.choice(index_base, num_agents_msm, replace=False) moments_boot = [calc_moments[key](data.loc[ids_boot, :]) for key in calc_moments.keys()] moments_boot = rp.get_flat_moments(moments_boot) moments_sample.append(moments_boot) # Compute variance for each moment and construct diagonal weighting matrix. moments_var = np.array(moments_sample).var(axis=0) weighting_matrix = np.diag(moments_var ** (-1)) return np.nan_to_num(weighting_matrix)

[14]:

W = get_weighting_matrix(data_obs, calc_moments, 300, 500)

[15]:

pd.DataFrame(W)

[15]:

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19
0	4795.115376	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00
1	0.000000	4795.115376	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00
2	0.000000	0.000000	4795.115376	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00
3	0.000000	0.000000	0.000000	4795.115376	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00
4	0.000000	0.000000	0.000000	0.000000	4795.115376	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00
5	0.000000	0.000000	0.000000	0.000000	0.000000	4795.115376	0.000000	0.000000	0.000000	0.000000	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00
6	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	4795.115376	0.000000	0.000000	0.000000	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00
7	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	4795.115376	0.000000	0.000000	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00
8	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	4795.115376	0.000000	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00
9	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	4795.115376	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00
10	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	9.466832e+06	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00
11	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000e+00	6.036166e+06	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00
12	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000e+00	0.000000e+00	4.698312e+06	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00
13	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000e+00	0.000000e+00	0.000000e+00	4.248616e+06	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00
14	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	3.566318e+06	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00
15	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	1.779242e+07	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00
16	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	1.258966e+07	0.000000e+00	0.000000e+00	0.000000e+00
17	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	9.317930e+06	0.000000e+00	0.000000e+00
18	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	8.904094e+06	0.000000e+00
19	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	1.067252e+07

Criterion Function

We have collected the observed data for our model, chosen the set of moments we want to use for estimation and defined a weighting matrix based on these moments. We can now set up the criterion function to use for estimation.

As already discussed above, the criterion function is given by the weighted square product of the difference between observed moments \(m(X)\) and simulated moments \(m(\hat{X}| \theta)\). Trivially, if we have that \(m(X) = m(\hat{X}| \theta)\), the criterion function returns a value of 0. Thus, the closer \(\theta\) is to the real parameter vector, the smaller should be the value for the criterion function.

[16]:

criterion_msm = rp.get_moment_errors_func( params_true, options, calc_moments, replace_nans, moments_obs, W)

Criterion function at the true parameter vector:

[17]:

fval = criterion_msm(params_true)fval

[17]:

0.0

We can plot the criterion function to examine its behavior around the minimum in more detail. The plots below show the criterion function at varying values of all parameters in the the paramter vector.

[18]:

plot_criterion_params(params_true, list(params_true.index), criterion_msm, 0.05)

This depiction for most parameters conceals the fact that the criterion function is not a smooth function of our parameter values. We can reveal this property if we ‘zoom in’ on the function far enough. The plots below show the criterion function for varying values of delta around the true minimum value of 0.95. We can see that the function exhibits small plateaus and is thus not completely smooth.

[19]:

for radius in [0.05, 0.01, 0.001, 0.0001]: plot_criterion_params(params_true, list(params_true.index)[0:1], criterion_msm, radius)

Estimation Exercise

In the following we will conduct a simulation exercise to estimate the parameter vector using our criterion function and weighting matrix. We will begin by simulating data using the new parameter vector and examine how the simulated moments differ from the observed ones. We will then use an optimizer to minimize the criterion function in order to retrieve the true parameter vector. Additionally, we will explore how the criterion function behaves if we change the simulation seed, move away fromthe true values and test the optimization for a derivative-based optimization algorithm.

Estimation for one parameter

For now, our candidate parameter vector will just differ in delta from the true parameters.

[20]:

params_cand = params_true.copy()params_cand.loc["delta", "value"] = 0.93

Simulated Moments

We can now use our model to simulate data using the candidate parameter vector. We can see that the choice frequencies and wage distribution differ from the moments of the observed dataset.

[21]:

params = params_cand.copy()simulate = rp.get_simulate_func(params, options)df_sim = simulate(params)

[22]:

moments_sim = { "Choice Frequencies": replace_nans(calc_moments["Choice Frequencies"](df_sim)), "Wage Distribution": replace_nans(calc_moments["Wage Distribution"](df_sim)),}

We can plot the moments to compare the choice frequencies for each period.

[23]:

plot_moments_choices(moments_obs, moments_sim)

The plots below show the mean and the standard deviation in the wages for each period.

[24]:

plot_moments_wage(moments_obs, moments_sim)

The criterion function value for the candidate parameter vector is not zero.

[25]:

fval = criterion_msm(params_cand)fval

[25]:

7838.131228444257

Optimization Procedure

We will now use an optimization procedure to retrieve the true parameter vector. For the optimization we can use estimagic. In order to minimize the criterion function we need estimagic’s minimize function.

[26]:

from estimagic import minimize # noqa: E402

We have verified above that the criterion function gives a value of 0 for the true parameter vector. Before we try different parameter specifications, we can check whether an optimizer recognizes the true vector as the minimum of our criterion function.

As the code below shows, the optimization algorithm recognizes the true parameter vector as the minimum of the criterion function as it returns a function value of 0 and the true parameter values.

[27]:

rslt = minimize( criterion=criterion_msm, params=params_true, algorithm="nag_pybobyqa",)rslt["solution_criterion"]

[27]:

0.0

Upper and Lower Bounds

We can help the optimizer by specifying bounds for the parameters. Since we know the true parameters in the case of this model, we can just pick upper and lower bounds that are fairly close to the true values of the parameters to aid the optimizer in the search for the optimum. By default, the upper and lower bounds are set to \(\infty\) and \(-\infty\), so specifying upper and lower bounds substantially reduces the range of parameter values that the optimizer can potentially cover.

For optimization with estimagic, we can specify bounds by adding the columns ‘lower’ and ‘upper’ to the dataframe that contains the parameter values.

[28]:

params_cand["lower_bound"] = [0.69, 0.066, -0.2, 1, 0, 0, 0]params_cand["upper_bound"] = [1, 0.078, -0.09, 1.1, 0.5, 0.5, 0.5]params_cand

[28]:

		value	lower_bound	upper_bound
category	name
delta	delta	0.930	0.690	1.000
wage_fishing	exp_fishing	0.070	0.066	0.078
nonpec_fishing	constant	-0.100	-0.200	-0.090
nonpec_hammock	constant	1.046	1.000	1.100
shocks_sdcorr	sd_fishing	0.010	0.000	0.500
	sd_hammock	0.010	0.000	0.500
	corr_hammock_fishing	0.000	0.000	0.500

Constraints

Additionally we hold all other parameters fixed for now to aid the optimizer in finding the optimal value for delta.

[29]:

# Define base constraints to use for the rest of the notebook.constr_base = [ {"loc": "shocks_sdcorr", "type": "sdcorr"}, {"loc": "delta", "type": "fixed"}, {"loc": "wage_fishing", "type": "fixed"}, {"loc": "nonpec_fishing", "type": "fixed"}, {"loc": "nonpec_hammock", "type": "fixed"}, {"loc": "shocks_sdcorr", "type": "fixed"},]

[30]:

# Remove constraint for delta.constr = constr_base.copy()constr.remove({"loc": "delta", "type": "fixed"})

Optimize

We can now optimize the criterion function with respect to the parameter vector. The optimizer manages to reach a function value of 0 and finds an approximately correct delta for our model.

This exercise again reveals that we are dealing with a non-smooth criterion function. The optimizer does not return the exact value of 0.95 for delta because of the little plateaus we saw when zooming into the criterion function. As the plot shows, there is a small area around the true value for delta that also returns a function value of 0 and might thus be picked by the optimizer.

[31]:

rslt = minimize( criterion=criterion_msm, params=params_cand, algorithm="nag_pybobyqa", constraints=constr,)rslt["solution_criterion"]

[31]:

0.5248952065747418

Chatter in the Criterion Function

In this exercise we explore the sensitivity of the criterion function to the choice of simulation seed and number of agents.

Changing the Simulation Seed

As shown above, the optimizer manages to find a function value of exactly 0 for the true parameter vector. This is the case because respy controls the realization of random elements in the simulation process via a simulation seed. The model thus always simulates the exact same dataset for a given parameter vector. Our criterion function becomes exactly 0 at the true parameter vector because for this vector, the observed and simulated data are identical.

Changing the simulation seed results in simulated moments that, even for the true parameter vector, are never completely equal to the observed moments.

Let’s estimate the model with a different simulation seed.

[32]:

options_new_seed = options.copy()options_new_seed["simulation_seed"] = 400

We can see that the criterion function is not 0 anymore for the true parameter vector.

[33]:

criterion_msm = rp.get_moment_errors_func( params_true, options_new_seed, calc_moments, replace_nans, moments_obs, W)criterion_msm(params_true)

[33]:

38.97135021944551

Our optimizer thus also does not return a function value of 0 for the true parameter vector.

[34]:

rslt_new_seed = minimize( criterion=criterion_msm, params=params_true, algorithm="nag_pybobyqa", constraints=constr,)rslt_new_seed["solution_criterion"]

[34]:

17.369549302805428

Since the optimizer does not even recognize the true parameter vector, it is also not able to reach a criterion function value of 0 for a different parameter vector.

[35]:

rslt_new_seed = minimize( criterion=criterion_msm, params=params_cand, algorithm="nag_pybobyqa", constraints=constr,)rslt_new_seed["solution_criterion"]

[35]:

17.369549302805428

Multiple Simulation Seeds

The section above shows what happens to the criterion function if we change the seed for simulating the data. In the sections below we will repeat this exercise for multiple seeds and explore what happens if we increase the number of simulated as well as observed agents in our model.

[36]:

# List of seeds, includes true simulation seed of 132.seeds = list(range(120, 140, 1))

[37]:

# Define other inputs for criterion functioncriterion_kwargs = dict()criterion_kwargs["params"] = params_true.copy()criterion_kwargs["options"] = optionscriterion_kwargs["calc_moments"] = calc_momentscriterion_kwargs["replace_nans"] = replace_nanscriterion_kwargs["moments_obs"] = moments_obscriterion_kwargs["weighting_matrix"] = W

The plot below shows the criterion function for different simulation seeds, including the true one. We can seed that different seeds lead to different fits between the simulated and observed data, with some seeds performing worse than others. Only the true seed leads to a function value of 0.

[38]:

plot_chatter(seeds, criterion_kwargs)

Changing the Simulation Seed for increasing Sample Size of Simulated Agents

We now repeat this exercise for an increasing number of simulated agents. As the plot below shows, increasing the number of simulated agents reduces the chatter considerably but the criterion function remains consistently above zero. Only the simulated sample of 1000 agents at the true simulation seed reaches a function of zero, since, as discussed before, the simulated sample is identical to the observed one for this calibration.

[39]:

# Changing the number of agents.num_agents = [500, 1000, 5000, 10000, 50000]

[40]:

plot_chatter_numagents_sim(seeds, num_agents, criterion_kwargs)

Changing the Simulation Seed for increasing Sample Size of Simulated and Observed Agents

We now increase not only the number of simulated agents but also the number of observed agents in our sample. Doing so decreases the chatter in the criterion function as before but also levels out the function around 0 for all simulation seeds. For large enough samples of observed and simulated agents the choice of simulation seed is thus irrelevant.

[41]:

plot_chatter_numagents_both(seeds, num_agents, calc_moments, replace_nans, criterion_kwargs)

Note: In contrast to the previous plot, we can see that the function reaches a value of 0 at the true simulation seed for all quantities of agents. This is the case because we now simultaneously increase the number of both simulated and observed agents while the observed sample remained untouched in the previous plot. Increasing the number of agents simultaneously for both groups creates identical samples for the true simulation seed at each quantity of agents.

Moving away from the Optimum

For the next exercise we explore what happens to the criterion function if we move away from the optimum. We do this in a controlled fashion by changing the parameter values for delta and wage_fishing and fixing the latter in the constraints. Doing so should lead delta to move away from its optimal value of 0.95 during optimization.

[42]:

params_cand.loc["wage_fishing", "value"] = 0.072params_cand

[42]:

		value	lower_bound	upper_bound
category	name
delta	delta	0.930	0.690	1.000
wage_fishing	exp_fishing	0.072	0.066	0.078
nonpec_fishing	constant	-0.100	-0.200	-0.090
nonpec_hammock	constant	1.046	1.000	1.100
shocks_sdcorr	sd_fishing	0.010	0.000	0.500
	sd_hammock	0.010	0.000	0.500
	corr_hammock_fishing	0.000	0.000	0.500

The optimizer cannot retrieve the true parameter and does not reach a value of 0.

[43]:

criterion_msm = rp.get_moment_errors_func( params_true, options_new_seed, calc_moments, replace_nans, moments_obs, W)rslt_wrong_fix = minimize( criterion=criterion_msm, params=params_cand, algorithm="nag_pybobyqa", constraints=constr,)rslt_wrong_fix["solution_criterion"]

[43]:

813.7988993578208

Retrieving the true Parameter Vector

We now repeat the estimation with the new parameter vector and free up wage_fishing to retrieve the optimal values for both parameters.

The parameter for wage_fishing is still 0.072 since we fixed it for the prior estimation:

[44]:

params_cand.loc[:, "value"] = rslt_wrong_fix["solution_params"][["value"]]params_cand

[44]:

		value	lower_bound	upper_bound
category	name
delta	delta	0.923022	0.690	1.000
wage_fishing	exp_fishing	0.072000	0.066	0.078
nonpec_fishing	constant	-0.100000	-0.200	-0.090
nonpec_hammock	constant	1.046000	1.000	1.100
shocks_sdcorr	sd_fishing	0.010000	0.000	0.500
	sd_hammock	0.010000	0.000	0.500
	corr_hammock_fishing	0.000000	0.000	0.500

We now free up wage_fishing in the constraints in addition to delta.

[45]:

# Adjust constraints to free up both delta and wage_fishing.constr_u = constr_base.copy()constr_u.remove({"loc": "delta", "type": "fixed"})constr_u.remove({"loc": "wage_fishing", "type": "fixed"})

Freeing up the non-optimal wage_fishing improves the estimates. The criterion function value is much closer to 0 and the optimizer manages to retrieve the true parameter values quite closely.

[62]:

rslt_unfix = minimize( criterion=criterion_msm, params=params_cand[["value"]], algorithm="nag_pybobyqa", constraints=constr_u,)rslt_unfix["solution_criterion"]

[62]:

710.9165575145821

For easier comparison, we can compute the difference between the true and estimated value:

[63]:

deviation = params_true["value"] - rslt_unfix["solution_params"]["value"]deviation

[63]:

category namedelta delta 0.026703wage_fishing exp_fishing -0.001791nonpec_fishing constant 0.000000nonpec_hammock constant 0.000000shocks_sdcorr sd_fishing 0.000000 sd_hammock 0.000000 corr_hammock_fishing 0.000000Name: value, dtype: float64

Derivative-Based Optimization Algorithm

So far we have used only one optimization algorithm to estimate the parameter vector. The algorithm we used, BOBYQA (Bound Optimization by Quadratic Approximation), is a derivative-free optimization algorithm which works fairly well on our criterion function. As discussed above, our criterion function is a step function that contains plateaus for certain ranges of parameter values. An important implication of this property is that we cannot calculate proper derivatives and thus derivative-basedoptimization algorithms will not work to estimate the parameters.

To demonstrate this problem, we will now try to estimate the parameters using an optimization algorithm that does use derivatives during optimization.

[64]:

# Define candidate parameter vector and constraints.params_cand = params_true.copy()params_cand.loc["delta", "value"] = 0.93params_cand["lower_bound"] = [0.79, 0.066, -0.11, 1.04, 0, 0, 0]params_cand["upper_bound"] = [1, 0.072, -0.095, 1.055, 0.5, 0.5, 0.5]constr = constr_base.copy()constr.remove({"loc": "delta", "type": "fixed"})

We try the L-BFGS-B (Limited-Memory-Broyden-Fletcher-Goldfarb-Shanno with box constraints) algorithm. As shown below, L-BFGS-B fails and the optimizer returns the same parameter vector that we used as an input.

[65]:

criterion_msm = rp.get_moment_errors_func( params_true, options_new_seed, calc_moments, replace_nans, moments_obs, W)rslt = minimize( criterion=criterion_msm, params=params_cand, algorithm="scipy_lbfgsb", constraints=constr,)rslt

[65]:

{'solution_x': array([0.93]), 'solution_criterion': 7204.902739425344, 'solution_derivative': array([0.]), 'solution_hessian': None, 'n_criterion_evaluations': 1, 'n_derivative_evaluations': None, 'n_iterations': 0, 'success': True, 'reached_convergence_criterion': None, 'message': 'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL', 'solution_params': lower_bound upper_bound value category name delta delta 0.790 1.000 0.930 wage_fishing exp_fishing 0.066 0.072 0.070 nonpec_fishing constant -0.110 -0.095 -0.100 nonpec_hammock constant 1.040 1.055 1.046 shocks_sdcorr sd_fishing 0.000 0.500 0.010 sd_hammock 0.000 0.500 0.010 corr_hammock_fishing 0.000 0.500 0.000}

References

Adda, J., & Cooper, R. W. (2003). Dynamic Economics: Quantitative Methods and Applications. MIT press.
Adda, J., Dustmann, C., & Stevens, K. (2017). The Career Costs of Children. Journal of Political Economy, 125(2), 293-337.
Andrews, I., Gentzkow, M., & Shapiro, J. M. (2017). Measuring the Sensitivity of Parameter Estimates to Estimation Moments. The Quarterly Journal of Economics, 132(4), 1553-1592.
Bruins, M., Duffy, J. A., Keane, M. P., & Smith Jr, A. A. (2018). Generalized Indirect Inference for Discrete Choice Models. Journal of econometrics, 205(1), 177-203.
Davidson, R., & MacKinnon, J. G. (2004). Econometric Theory and Methods (Vol. 5). New York: Oxford University Press.
Evans, R. W. (2018, July 5). Simulated Method of Moments (SMM) Estimation. Retrieved November 30, 2019, from https://notes.quantecon.org/submission/5b3db2ceb9eab00015b89f93.
Frazier, D. T., Oka, T., & Zhu, D. (2019). Indirect Inference with a Non-Smooth Criterion Function. Journal of Econometrics, 212(2), 623-645.
Gourieroux, M., & Monfort, D. A. (1996). Simulation-based econometric methods. Oxford university press.
McFadden, D. (1989). A Method of Simulated Moments for Estimation of Discrete Response Models without Numerical Integration. Econometrica: Journal of the Econometric Society, 995-1026.