+1 vote
in Other by
I have a data frame with 3 variables: place, time, and value (P, T, X). I want to create a fourth variable which will be the cumulative sum of X. Normally I like to do grouping calculations with sqldf, but can't seem to find an equivalent for cumsum. That is:

sqldf("select P,T,X, cumsum(X) as X_CUM from df group by P,T")

doesn't work. Is this even possible with sqldf? I tried doBy, but that doesn't all cumsum either.

JavaScript questions and answers, JavaScript questions pdf, JavaScript question bank, JavaScript questions and answers pdf, mcq on JavaScript pdf, JavaScript questions and solutions, JavaScript mcq Test , Interview JavaScript questions, JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)

1 Answer

0 votes
by
Set up some test data:

DF <- data.frame(t = 1:4, p = rep(1:3, each = 4), value = 1:12)

and now we have three solutions. First we use sqldf, as requested, using the default SQLite database. Next we do it with sqldf again but this time with PostgreSQL using RPostgreSQL or RpgSQL driver. PostgreSQL supports analytical windowing functions which simplify the SQL. You will need to set up a PostgreSQL database first to do that one. Finally we show a pure R solution which only uses the core of R.

1) sqldf/RSQLite

library(sqldf)

sqldf("select a.*, sum(b.value) as cumsum

    from DF a join DF b

    using (p)

    where a.t >= b.t

    group by p, a.t"

)

2) sqldf/RPostgreSQL

library(RPostgreSQL)

library(sqldf)

sqldf('select *,

    sum(value) over (partition by p order by t) as cumsum

    from "DF"'

)

(This also works with the RpgSQL PostgreSQL driver. To use that you must have Java installed and a PostgreSQL database set up and then in place of the above use: 1ibrary(RpgSQL); sqldf(...) where the same SQL string is used except there should be no quotes around DF.)

3) Plain R

transform(DF, cumsum = ave(value, p, FUN = cumsum))
...