A good idea would be to measure each iteration separately and then discard outliers by e.g. discarding those that exceed the abs diff between the mean and the stddev.

I leave it as an exercise to the reader to figure out why that's **not** a good idea. In any case, the last unit of the class dealt with the fun part of statistics, which is to actually evaluate whether observed data is statistically significant. The actual math involved isn't too bad, assuming you have someone spit out a cumulative distribution function for the t-distribution for you (here is Boost's code), but it's a bit convoluted to write here, so read Wikipedia's page. The correct tests are actually the two-sampled t-tests.

But I got to thinking. One thing I've wanted to see for a while is a profiling extension, one that would run some JS code snippet multiple times and produce various reports on profiling runs. Wouldn't it be nice if such an extension could compare two runs and determine if they was a statistically significant difference between them?

## 3 comments:

Dromaeo, the JS benchmark that John Resig wrote, does this. I don't know how easily it is to plug in extra tests, though.

https://wiki.mozilla.org/Dromaeo#Statistical_Confidence

The t-distribution is only accurate under the assumption that the population is normally distributed. Do you think that's a good assumption for code speed? (might be, I'm just not sure)

An alternative approach would be to just increase the sample size by running it 50 or so times.

Yeah I guess it is, well... sounds I'm not sure, but like you said it's faster.

Post a Comment