I demonstrate this below using a simple Monte Carlo simulation.
In a tobit-like fashion, one would like to control for the selection chance, but this is given by gender which is already in the model.
It would be interesting and easy to make this simulation based on actual means, sd, skewness in the (male) population.
clear
cap erase mc.dta
foreach n of numlist 60 80 100 120 140 200 300 500 800 1000 2000 {
forvalues r = 1/500 {
clear
*local n = 1000
set obs `n'
local m = 2000 // mean
local s = 200 // standard deviation (spread)
local f = 0.1 // feminisation
local a = 3 // positive: right skewed (it is possible to compute the skewness metric based on a)
gene g = 1-(runiform()<`f')
gene rn = runiform()
gene d = 2*invnormal(rn)*normal(rn*`a')
gene w = max(1400,`m'+d*`s')
*twoway kdensity w
collapse w, by(g)
gene nsize = `n'
gene c = 1
reshape wide w, i(c) j(g)
list
cap gene gpg = w1/w0
cap gene gpg = .
cap append using mc.dta
cap save mc.dta, replace
}
}
replace gpg = round(100*gpg-100,.01)
tabstat gpg, by(nsize)
*scatter gpg nsize
exit