Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upSuggestion for change of how predicted values are defined #144
Comments
I realised that what I have proposed above is exactly the predictive margins concept from stata. Although this is not ported into the |
Hi @langbe it's not going to be part of marigns, unfortunately, as I want the concepts to be in different packages for clarity. But you're right that this is what should be happening. I've unfortunately had negative bandwidth for open source work due to the pandemic. |
langbe commentedApr 8, 2020
•
edited
When calculating predictions with cplot()
what = "prediction"
results in predictions holding values of the data at their mean or mode. Although the question what we are trying to estimate is not unique, in the context of AME it seems more reasonable to me to produce different predictions. My proposal is based on the work from Bartushttps://journals.sagepub.com/doi/pdf/10.1177/1536867X0500500303
Let us assume we have a binary variable x_i. According to equation (5) the AME estimator can be written as
AME_i = 1/n sum_{k = 1}^n [F(beta * x^k | x_i^k = 1) - F(beta * x^k | x_i^k = 0)]
or equivalently
AME_i = 1/n sum_{k = 1}^n F(beta * x^k | x_i^k = 1) - 1/n sum_{k = 1}^n F(beta * x^k | x_i^k = 0).
In the latter representation we may consider each of the two terms as (conditional) predicted value. The first is for x_i = 1 and the second for x_i = 0. Those individual predicted values are consistent with the corresponding AME estimate in the sense that the difference exactly equals the AME estimate. Currently, the difference of the two predicted values is not the same as the AME estimate.
For example
Applying the above mentioned formula we get:
I realize that
am_1
andam_0
are quite different than the current individual predictions (yvals), but I wanted to nevertheless propose to use those instead. Happy to hear your thoughts on that.Remarks:
Note that I was not sure whether the cplot call as above but with
what = "effect"
is what I was looking for. However, the result has zero observations in it, so I assume it is not.This should analogously generalize to categorical variables with more than 2 levels (and to continuous variables)