dinsdag 28 februari 2017

Gross domestic product

Introduction

The gross domestic product (GDP) is arguably the most important economic indicator, drawing a picture of total output and economy produces in a given year. The GDP can be approached in three ways:

  1. Production approach
  2. Income approach
  3. Expenditure approach
All three are equivalent but have different components. What is produced is equal to what is earned, and this is equal to what gets spent. I will address the components of each GDP definition, and briefly discuss how to practically deal with it.

Production approach

GDP is the gross value of domestic output of all economic activities (GDP at market prices. This is the value of the total sales of goods and services plus value of changes in the inventories. In the production approach it consists of gross value added (GDP at factor cost) and the value of intermediate consumption (i.e., the cost of material, supplies and services used to produce final goods or services).

Gross value added = gross value of output – value of intermediate consumption.

GDP at factor cost plus indirect taxes less subsidies on products is the "GDP at producer price".

Income approach

GDP = COE + GOS + GMI + [T_PM – S_PM]
compensation of employees (COE) + gross operating surplus (GOS) + gross mixed income (GMI) + taxes less subsidies on production and imports (T_PM – S_PM).

Expenditure approach

Y = C + I + G + (X − M)
consumption (C) + investment (I) + government spending (G) + net exports (X – M)

Often Y is used in this definition instead of GDP.

Gross value added

Gross value added (GVA) is the measure of the value of goods and services produced in an area, industry or sector of an economy, in economics. I.e. of output.

GVA = GDP - intermediate consumption (production approach)
GDP = GVA + taxes on products - subsidies on products
GVA = GDP + subsidies - (direct, sales) taxes

If taxes > subsidies, GDP > GVA, which is the case for Belgium.

Gross value added is used for measuring gross regional domestic product and other measures of the output of entities smaller than a whole economy (for which taxes-subsidies are aggregated).

Over-simplistically, GVA is the grand total of all revenues, from final sales and (net) subsidies, which are incomes into businesses. Those incomes are then used to cover expenses (wages & salaries, dividends), savings (profits, depreciation), and (indirect) taxes.

More on the difference in interpretation between GVA and GDP on Quora.

GDP, GNP, GNI, and HDI

GDP can be contrasted with gross national product (GNP) or gross national income (GNI). The difference is that GDP defines its scope according to location, while GNP/GNI defines its scope according to ownership. In a global context, world GDP and world GNP/GNI are, therefore, equivalent terms. Most often, we care about GDP, but GNI might indicate that an economy is colonized or colonizing.

The Human Development Index (HDI) was created by the United Nations to emphasize that people and their capabilities should be the ultimate criteria for assessing the development of a country, not economic growth alone. The HDI can also be used to question national policy choices, asking how two countries with the same level of GNI per capita can end up with different human development outcomes. These contrasts can stimulate debate about government policy priorities. The Human Development Index (HDI) is a summary measure (geometric mean) of average achievement in key dimensions of human development:

  1. Healtha long and healthy life (life expectancy)
  2. Education (years of schooling)
  3. Standard of living (log of the GNI per capita)

In practice, GDP and the HDI correlate very strongly, although there are notable differences that appear even sharper in other 'happiness' indicators. For instance: if two persons decide to stay at home to clean the house and care for the children, nothing is registered in the GDP. If each cleans the house of the other and pays him or her for the job the same wage, we have precisely the same utility, but two wages are added to GDP. I nevertheless favour the use of GDP as an objective measurement for economic development (in the example given, a market for cleaning opens up), but keeping in mind that we are more interested in utility than in the mere quantity of goods and services provided.

GDP figures from Eurostat

GDP figures can be downloaded from Eurostat (key: namq_10_gdp). Below are a few tips to select the correct indicators:
  • The European System of Accounts (ESA) gives a guideline for the measurement of production in sectors. If you want to have a good laugh, look at what happened in Ireland in 2015: 30% GDP growth because of a change in the guidelines! Depending on the source you will find longer time series in one or the other system, but for recent years in Europe, you should take the most recent ESA.
  • The broadest GDP in nominal terms is 'GDP at market prices'. Mostly you want it to be in EUR or USD, not in the national currency, unless exchange rates are what passionates you.
  • GDP in real term is found in the 'chain linked volume' series. Basically, chain linking output means output is expressed in the prices of the previous year, but as it goes this boils down to using a base year.
Note that ESA/Eurostat use the code B for GDP components using the production approach, code D for components using the income approach, and code P for the expenditure approach (which are extensive).

Together with Sebastien Fontenay I have made the -eurostatuse- command in Stata to fetch data from Eurostat. To make indices, I have made the -reindex- command you find elsewhere on this blog or if you send me an email.

vrijdag 17 februari 2017

Politieke overtuiging naar studierichting

Deze post is een klein onderzoek. Vaak hoor je economische opinies, voornamelijk over de arbeidsmarkt, maar naar voor gebracht door politici of zakenlui die geen arbeidseconomen zijn. Dat is natuurlijk niet verboden, maar het bevuilt soms het beeld dat mensen van economisten hebben.

Arbeidseconomie

Maarten Goos
Joep Konings
Eddy Omey
Filip Abraham

Bedrijfskunde / toegepaste economische wetenschappen / management

Johan Van Overtvelt
Marc Descheemaecker (+ Europese economie Europacollege), voormalig CEO NMBS,  voorzitter RvB Brussels Airport
Donald Trump (Wharton, bachelor, vastgoed)

Overige economische wetenschappen

Koen Schoors
Gert Peersman
Geert Noels
Freddy Heylen
Andreas Tirez, Liberales
Luc Denayer, secretaris CRB
Herman Van Rompuy, voorzitter Raad van Europa
Yannis Varoufakis (speltheorie)
Jeroen Dijselbloem (landbouweconomie)
Luc Versele, CEO Crelan
Frank Vandenbroucke, voormalig voorzitter SP.a
Peter De Roover, voorzitter Vlaamse Volksbeweging

Geschiedenis

Bart De Wever, voorzitter N-VA
Mark Rutte, VVD, eerste minister

Geneeskunde

Maggie De Block (huisarts), minister van volksgezondheid, regering Michel I
Daniel Bacquelaine (huisarts), minister van pensioenen regering, Michel I

Communicatiewetenschappen

Freya Van den Bossche, Vlaams minister, SP.a

Politieke wetenschappen

Kristof Calvo
Jan Peumans
Willy Claes, socialistisch politicus, voorzitter NATO
Caroline Gennez, SP.a

Rechten

Gerolf Annemans, Vlaams Blok
Jo Libeer, voormalig CEO VOKA
Kris Peeters, minister van werk, voormalig voorzitter UNIZO (+ kandidaat wijsbegeerte, + postgrad boekhouden/fiscaliteit bij Vlerick)
Geert Bourgeois
Gwendolyn Rutten (+ politieke wetenschappen)
Charles Michel, eerste minister
Luc Van den Bossche (doctor), SP.a
Johan Vande Lanotte (prof), SP.a, burgemeester Oostende
Yves Leterme (+ politieke wetenschappen), CD&V, eerste minister
Bert Anciaux, VU-ID21/Spirit/SP.a
Jean-Luc Dehaene (doctor)
Wolfgang Schäuble (+ economie)
Nicolas Sarkozy
Rik Torfs (kerkelijk recht)
Guy Verhofstadt, Open VLD, eerste minister
Etienne Davignon, zakenman
Danny Pieters, N-VA

Taal & letterkunde

Siegfried Bracke (germanist), Han Vermeer / Valère Descherp, Vlaams-nationaal politicus, voormalig journalist
Louis Tobback, SP.a, burgemeester van Leuven

Sociologie

Hans Bonte (+ politieke wetenschappen)
Jan Denys, arbeidsmarktexpert Randstadt
Jos Geysels, voormalig voorzitter Agalev
Kathleen Van Brempt, Europarlementslid SP.a

Psychologie

Daniel Kahneman, nobelprijswinnaar economie, Thinking Fast and Slow

Filosofie

Overige

Jesse Klaver (sociaal werk), voorzitter GroenLinks
Karel Van Eetvelt (sportwetenschappen), voorzitter UNIZO
Geert Wilders (bachelor verzekeringen)
Meyrem Almaci (vergelijkende cultuurwetenschappen ~ antropologie)
Filip Dewinter (bachelor journalistiek)
Jan Jambon (licentiaat informatica)
Jean Jacques De Gucht (culturele agogiek), fractievoorzitter senaat
Tom Van Grieken (bachelor communicatiemanagement)
Guy Van Hengel (onderwijzer), Open VLD
Elke Sleurs (gynaecologe), staatssecretaris N-VA
Alexis Tsipras (ingenieur)
Elio Di Rupo (chemicus)
François Hollande (haute école)
Albert Frère (ongeschoold)
Theo Francken (licentiaat pedagogie)
Marc Coucke (farmacie)
Steve Stevaert, socialist, burgemeester van Hasselt, voorzitter SP.a, gouverneur Limburg (horeca)
Karel Dillen, oprichter Vlaams Blok (humaniora)


maandag 6 februari 2017

Making sense of measuring effect size (eta-squared)

Introduction: it's impossible

We like to know whether effects are significant, or more significant than others, but generally avoid the issue of whether effects are substantial. On rare occasions, when an OLS regression suffices, a statistician may want to report R-squared values. However, do not expect much enthusiasm, as R-squared can only increase when adding additional variables even if they do not make sense. Adjusted R-squared, AIC and BIC measures correct for this. Also, R-squared has to be computed differently depending on whether or not there is a constant in the model (see Hayashi, 2000, p. 20). Worse, if your model is not OLS, you will need to define some pseudo-R-squared.

Needless to say, if you want an R-squared value for any subset of explanatory factors, statisticians will hate you. This is not possible, because of collinearity, they will say. You can stepwise add or delete variables and check the change in R-squared, but this will include or remove variance that is due to other variables. Sadly!

Objection: it may be possible

The people at UCLA, in fact Philip B. Ender, thought differently. They have a Stata package called -regeffectsize- for grabs which partitions the variance explained by the separate explanatory factors, giving a good indication of how substantial effects are. The help-file is not very instructive, so I deciphered the .ado myself, and here's the report. I'm sorry this is not (yet) in LaTeX, but speed over quality for now.

The structure of the program is as follows:
  • From the regression, they take:
    • TSS, total sum of squares in Y, which is RSS+MSS
    • RSS: residual sum of squares
    • MSS: the model sum of squares
    • DFR: residual degrees of freedom, which is n-k (n: sample size, k: number of variables in the model)
    • If you divide RSS by DFR you get the mean summed error MSE
  • From the F-test (simply -test-), they take:
    • The F-value
Obviously, eta-squared as we know it is ETA2 = MSS/TSS and it is generally decomposed in a within and between effect. Typically we get an ETA2 for the second level of a multi-level model, to indicate how much variation exists between versus within the second level units. This is also one formulation of the intraclass correlation coefficient ICC.

So how do they get to the (semi)partial eta-squared formula? There are three formulations, which are more or less the same and build up. The basis is the effect sum of squares, which is ESS = F*MSE.

Semipartial eta-squared

ETA2 = ESS/TSS = F*MSE/TSS

A glimpse of genius, if you ask me, to use a distribution unit (F) into the equation for the partitioning of variance. You might think that if F is really key, then our partitioning will be little more than a comparison of significance levels. The higher ESS, the more important the explanatory variable – for this it needs to be significant. In Hayashi, 2000, p. 53, I found some intuition that may be helpful. The F-test is a monotone transformation of the likelihood ratio.

Recall that MSE = RSS/DFR = RSS/(n-k) and add that F = (n-k)*(RSSr/RSSu-1) when there is only one test variable (#r = 1) to compare with the null. It is easy to see that n-k drops from the RHS and we are left with:

ETA2 = [ RSS*(RSSr-RSSu)/RSSu ] / TSS

RSSu are the unrestricted errors when beta is as calculated. Hence RSSu = RSS (the model residual sum of squares). RSSr are the restricted errors under the null (i.e. beta = 0). We can rewrite the equation to:

ETA2 = (RSSr-RSS)/TSS

Say there is only one variable in the model, then RSSr = TSS, and ETA2 = R2 = MSS/TSS. In general, the better the prediction, the lower RSS will be and the higher ETA2. Note that by definition RSS > RSSr, unless there is nothing explained, and then ETA2 = 0. What matters is the difference if we 'exclude' a variable: then RSSr rises proportionally to the importance of that variable.

Is this a good metric? I would tend to think so. The advantage of this approach in my view is that the beta's of other variables are unbiased by the omission of one variable, so MSS is unchanged and RSSr will be correct. There is also no difference in DFR in the models to compare, as it would be with stepwise deletion. To me, this looks useful as an approach to explain how much of total variance is explained by one explanatory variable, although I may overlook something. In that case, let me know.

Oh, I don't know why it would be semi-partial – the .ado uses the macro `spe2', so it's just a guess.

Percentage eta-squared

%ETA2 = 100*ETA2/R2

I think that is obvious enough. The output says 'change eta-squared', but there is no change involved.

Partial eta-squared

Here we have a little change, to not include the variance explained by other variables in the denominator. This will always be larger than the semipartial eta-squared, and will be exponentially higher when ESS approaches MSS. It could be a solution to avoid the issue of having much shared variance in MSS which is not in ESS, I'm not sure.

Part.ETA2 = ESS / (ESS+RSS)

References







Stata user command -norm-

Normalizing a variable is not so hard to do. The formula is:

normvar = (oldvar - minimum) / (maximum - minimum)

Hence you get a long way with the -summarize- command and r(min) and r(max).

However, in some cases you want to normalize to a wider range than appears in the data, e.g. rescale a variable that was coded 1-5 to 0-100, despite there not being units with either value 1 or 5. Maybe such cases exist in another pool, so you want to transform the variable in the same way. Again this is easy to do, but it may get tedious and make your syntax look ugly and confusing.

This is where the -norm- command may be of use.

Syntax

norm var [, replace generate(newvarname) scale(integer) maximum(integer) minimum(integer) [no]label]

The default scale is 1, hence range: 0-1.
By default value 0 gets the label 'Minimum' and the maximum gets a label too. You can wipe the label using the option nolabel.

Examples

norm q89, scale(100) replace
norm q93, scale(20) max(5)

Installation

Download the files and put them in your ado-folder (plus/personal) or refer to the containing folder by adopath + folderpath.