| PL/R User's Guide - R Procedural Language | ||||
|---|---|---|---|---|
| Prev | Fast Backward | Fast Forward | Next | |
Starting with version 8.4, PostgreSQL supports WINDOW functions which provide the ability to perform calculations across sets of rows that are related to the current query row. This is comparable to the type of calculation that can be done with an aggregate function. But unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row; the rows retain their separate identities. Behind the scenes, the window function is able to access more than just the current row of the query result. See the PostgreSQL documentation for more general information related to the use of this capability.
PL/R functions may be defined as WINDOW. For example:
CREATE OR REPLACE
FUNCTION r_regr_slope(float8, float8)
RETURNS float8 AS
$BODY$
  slope <- NA
  y <- farg1
  x <- farg2 
  if (fnumrows==9) try (slope <- lm(y ~ x)$coefficients[2])
  return(slope)
$BODY$
LANGUAGE plr WINDOW;
     
A number of variables are automatically provided by PL/R to the R interpreter:
farg1 and farg2 are R vectors containing the current row's data plus that of the related rows.
The number of rows in the current WINDOW frame.
Provides the 1-based row offset of the current row in the current PARTITION.
A more complete example follows:
-- create test table
CREATE TABLE test_data (
  fyear integer,
  firm float8,
  eps float8
);
-- insert randomly pertubated data for test
INSERT INTO test_data
SELECT (b.f + 1) % 10 + 2000 AS fyear,
	floor((b.f+1)/10) + 50 AS firm,
       f::float8/100 + random()/10 AS eps
FROM generate_series(-500,499,1) b(f);
CREATE OR REPLACE
FUNCTION r_regr_slope(float8, float8)
RETURNS float8 AS
$BODY$
  slope <- NA
  y <- farg1
  x <- farg2 
  if (fnumrows==9) try (slope <- lm(y ~ x)$coefficients[2])
  return(slope)
$BODY$
LANGUAGE plr WINDOW;
SELECT *, r_regr_slope(eps, lag_eps) OVER w AS slope_R
FROM (SELECT firm, fyear, eps,
  lag(eps) OVER (ORDER BY firm, fyear) AS lag_eps
FROM test_data) AS a
WHERE eps IS NOT NULL
WINDOW w AS (ORDER BY firm, fyear ROWS 8 PRECEDING);
     
In this example, the variables farg1 and farg2 contain the current row value for eps and lag_eps, as well as the preceding 8 rows which are also in the same WINDOW frame within the same PARTITION. In this case since no PARTITION is explicitly defined, the PARTITION is the entire set of rows returned from the inner sub-select.
Another interesting example follows. The idea of "Winsorizing" is to return either the original value or, if that value is outside certain bounds, a trimmed value. So for example winsor(eps, 0.1) would return the value at the 10th percentile for values of eps less that that, the value of the 90th percentile for eps greater than that value, and the unmodified value of eps otherwise.
CREATE OR REPLACE FUNCTION winsorize(float8, float8)
RETURNS float8 AS
$BODY$
  library(psych)
  return(winsor(as.vector(farg1), arg2)[prownum])
$BODY$ LANGUAGE plr VOLATILE WINDOW;
SELECT fyear, eps, 
  winsorize(eps, 0.1) OVER (PARTITION BY fyear) AS w_eps
FROM test_data ORDER BY fyear, eps;
     
In this example, use of the variable prownum is illustrated.