vrijdag 23 oktober 2015

Stata float detail

Beware, Stata may be the most comfy and reliable software for STATistical Analysis, but it sucks in comparison with your handheld calculator if you want to come up with the simplest of figures. Why? Because computers do not see 0.1 as we do (decimal issue), and because there is a trade off between precision and speed or efficient storage (rounding issue).

Below is an experiment to show the second issue. Make three variables with long numerics, store them as float, double, and default. As there are more then 7 digits of precision, the float variable will be rounded.

clear
set obs 1
gene float  varfloat = 2567987654 
gene double vardouble = 2567987654
gene vardefault         = 2567987654

gene float varfloatchange = 10
replace varfloatchange   = 2567987654
format * %15.0f
list

     +---------------------------------------------------+
     |   varfloat    vardouble   vardefault   varfloat~e |
     |---------------------------------------------------|
  1. | 2567987712   2567987654   2567987712   2567987712 |
     +---------------------------------------------------+

As you can see, the default type was float, an Stata did not store any float variable with the same precision as the input. Instead, we have another figure, which happens to be 2567987712 for no apparent reason. It's not random, because if you rerun the syntax it is always the same. Probably it is some rounded exponentiated figure or the closest figure obtainable in a binary series of 7 digits or so.


donderdag 22 oktober 2015

Stata code for Herfindahl concentration index

Herfindahl concentration index

The Herfindahl concentration index computes a score between 0 (total dispersion) and 1 (total concentration) which is the sum of all squared shares within a unit.

In the working example hours per job for workers are used.

tempvar t1
egen `t1' = total(hours), by(workerid year)
tempvar t2
gene `t2' = (hours / `t1')^2
egen herfindahl = total(`t2'), by(workerid year)
drop `t1' `t2'
label var herfindahl "Herfindahl concentration index"