Data Management and Analysis: Data Transformations

These operations are used to compute functions of your data and to create new variables. Data may either be imported into the environment or may be created internally using the random number generators.

Command Structures

  • Conditional: If(...), Else, Then...
  • Parentheses to any number of levels
  • Implied multiplication of grouped variables and expressions

Algebraic Transformations

  • Standard operators: + - * / ^ (power)
  • Binary variable operators: x > 1 = 1(x > 1) likewise for <, >= etc.
  • Comparison operators: x ! y = max(x,y), x ~ y = ,in(x,y)
  • Expand categorical variable into a set of dummy variables


  • Log, Exp, Abs, Sqr, Sin, Rsn (arcsin), Cos, Rcs (arccos), Tan
  • Gamma, digamma, trigamma, log gamma, beta
  • Sign, fix (round to nearest integer), integer part
  • Box-Cox transformation and derivatives of Box-Cox

Random Number Generation

  • Continuous random variables, uniform, normal, lognormal, t, chi-squared, F, exponential, Weibull, Gumbel, gamma, beta, logistic, Cauchy, truncated standard normal
  • Discrete random variables: Poisson, discrete uniform, binomial, geometric
  • Halton sequences

Probability Distributions

  • Logistic: logit function, density, cdf
  • Univariate normal: density, cdf, truncated means and variances, sample selection ‘lambda,’ inverse normal cdf, inverse cdf to pdf
  • Bivariate normal: pdf, cdf, partial derivatives
  • Multivariate normal: cdf
  • Means, deviations, standardized variables
  • Panel data, group means and deviations
  • Rowwise moments of a set of variables
  • Multiple of a set of variables

Recode ranges of values or recode specific values to discrete values

Sort ascending or descending (carry other variables and/or labels)

Trends and seasonal dummy variables

Stratification variables and period variables for panels

Leads and lags

Matrix functions: dot products, quadratic forms

Sample statistical transformations