Labour Stats: april 2015

donderdag 30 april 2015

Markdown

I'm a sucker for simplicity. I found Word too multifunctional, I found LaTeX too hard. Then Markdown appeared, the language used on Wikipedia, on my Trello app, and - commonly - in emails.

Markdown syntax

There are some complication, like hyperlinks that I don't want to mention here, in order not to obscure the absolute simplicity of Markdown. This is what you'll use:

# Heading 1
## Heading 2
### Heading 3 (you get the point)
*bold*
**italic**
![Graph alt text](./path.png)

Nothing more.

Markdown on OS X

One address: MacDown. It is a beautiful program that has a split screen set-up: left you have a WYSIWYM interface, right there is a live preview. Did I say it is beautiful? It's free.

Markdown on Windows

I have tried a few, but on Windows with no luck. There is always some premium function for sale and I don't like that. WriteMonkey is a good idea, but the pdf support is ... not included. There are also a bunch of web apps, most notably Dillinger and StackEdit. They look good, but I like to open a file from windows explorer and then that's just a pain.

In fact, gVim will do all you want, but there's a learning curve as it doesn't behave like normal textpads (I like notepad++ for instance as an in-between solution).

Two programs however will do the job, and they are software packages! Obviously, R is free, and Stata is not. You will know that I love Stata - I think it's the best, albeit expensive, textbook in statistics and it also has free software with it.

Markdown in R and Rstudio

Rstudio is brilliant software. Use the -knitr- package to enjoy writing markdown files and exporting to pdf. It's made for that and it's used by millions.

About: http://yihui.name/knitr/
How to install -knitr-: https://github.com/yihui/knitr#readme3
Options: http://rmarkdown.rstudio.com/pdf_document_format.html

Markdown in Stata

Stata is a different beast. In a way it is less flexible and the markdown integration doesn't feel as native as in R. However, it is also easier.

You output an smlc log file, then you parse the file with the user command -markdoc- (Converting SMCL to Markdown and other formats using Pandoc). That's all there is to it. Remember to put your markdown code between command /* beginnings and endings */ which you put on separate lines. Also use - quietly - to reduce output.

There are a couple of other Stata programs doing a similar job:

Weaver: HTML and PDF dynamic report producer
Ketchup: HTML and PDF dynamic report producer
Synlight: SMCL to HTML convertor and syntax highlighter

Just -ssc install- those and discover their help files.

For the Markdoc command, see:

Markdoc author: http://haghish.com/statistics/stata-blog/reproducible-research/dynamic_documents/markdown.php
IDEAS: https://ideas.repec.org/c/boc/bocode/s457868.html
Alternatives: https://hopstat.wordpress.com/2014/01/11/stata-markdown-2/

woensdag 29 april 2015

Weights in Stata

Stata has four weights, your average statistical software has one. Still Stata is right, as I will try to explain in even simpler words than elsewhere. But let's ignore the iweight for programmers, and focus on the other three:

fweight or frequency weight - is probably the easiest, but most abused. It says that one observation represents the number indicated by the weight. Imagine you collapse a dataset based on gender, region, and educational attainment, and you regress education on gender, then the count of each line would be your fweight. Data is commonly stored in this way to reduce duplicate lines. It follows that fweights should be integers, because there is no such thing as a half respons.

pweight or sampling weight. For instance, you need to have 20% of women with an academic degree, but for some reason the sampling only gathered 10%, so you will want to double their weight. Using fweigh would overestimate the number of cases but underestimate the variance. In other software, you might rescale the weight so that it sums to the original n, but using pweight is better. Stata leaves you no choice because fweight does not work with nonintegers.

aweight provides analytical weights. Imagine that your data is collapsed including the mean of another variable, say wage. In that case the count still works as in fweight or pweight for point estimates, but precision increases with higher weights as the variance of the expected mean is more precise the more cases there are. Both pweight and aweight do rescaling not to inflate the number of cases above the total count, in contrast to fweight.

In sum, all weights return exactly the same coefficients, but different standard errors depending on the kind of data we're dealing with. One further note of caution: pweights and aweights are nonintegers, so precision is very important. I recommend storing such weights at double precision, not float. Also convert data from other formats using the 'double' option.

Labour Stats