Normal distribution sample simulation

Histograms are meaningless for datasets smaller than about 500 items – you will be better off using a dotplot. I think that the ‘error bar’ for each bar of the histogram can be approximated by the square root of the frequency so that a bar with a frequency of 36 could have a standard deviation of ±6 and so could be within a range of 24 to 48 two thirds of the time if a series of different samples were taken.

A simulation of the histogram (and frequency distribution) for samples of 100 and 1000 drawn from a normal distribution can show how the bars bounce around much less with the larger frequency. Excel has functions (rand(), countif()) that make it possible to make a spreadsheet that will display a new histogram each time you press F9. I cheated by approximating a normal distribution by adding together 10 lots of the RAND function for each cell. I then picked a scaling that gave me a mean of 160 and a standard deviation of 8 or so. This approximates to female height but with rather a high standard deviation.

  • Download the Excel 98 spreadsheet (52 Kb) Works on Excel 97 upwards and OpenOffice
  • There is a very useful Java applet that allows you to vary the sample size over a larger range, and vary the mean and SD dynamically.
  • There is a nice Javascript page where you can copy and paste data into a form, click a button, and calculate a range of summary statistics. One for the intranet!

Comments are closed.