Figure C.1. Differing means of plotting data

Figure C.2. Identical data graphed in four ways

Presentation of data in an orderly manner often calls for a graphic display. This is made somewhat easier today, with the advent of graphics programs for the personal computer, but still requires the application of some basic techniques.

The first consideration for a graph, is whether the graph is needed, and if so, the type of graph to be used. For accuracy, a well constructed table of data usually gives more information than a graph. The values obtained and their variability are readily apparent in a table, and interpolation (reading the graph) is unnecessary. For visual impact, however, nothing is better than a graphic display.

There are a variety of graph types to be chosen from; e.g. line graphs, bar graphs and pie graphs. Each of these has its own characteristics and subdivisions. One also has to decide upon singular or multiple graphs, two-dimensional or three dimensional displays, presence or absence of error bars, and the aesthetics of the display. The latter include such details as legend bars, axes labels, titles and selection of the symbols to represent data, and patterns for bar graphs.

Selection of a good Graphic Program for computer use will make most of these choices available. Refer to Appendix D for a quick review of such programs.

Unless you are specifically attempting to demonstrate an inverted function (which is confusing at best), the scales should always be arranged with the lowest value on the left of the x axis, and the lowest value at the bottom of the y value. The range of each scale should be determined by the lowest and higest value of your data, with the scale rounded to the nearest tenth, hundredth, thousandth, etc. That is, if the data ranges from 12 to 93, the scale should be from 10 to 100. It is not necessary to always range from 0, unless you wish to demonstrate the relationship of the data to this value (as for example in a Lineweaver-Burke plot of enzyme kinetics, or a Spectrophotometric standard curve).

The number of integrals placed on the graph will be determined by the point you wish to make, but in general, one should use about ten divisions of the scale. For our range of 12 to 93, an appropriate scale would be from 0 to 100, with an integral of 10. Placing smaller integrals on the scale does not convey more information, but merely adds a lot of confusing marks to the graph. The user can estimate the values of 12 and 93 from such a scale without having every possible value ticked off on a scale.

An important rule of scale deals with multiple graphs drawn separately. If the graphs are to be compared, the scales MUST REMAIN CONSTANT. Nothing is more disconcerting than to be shown two graphs with varying axis values and being asked to compare the two. It would be better to merely tabulate the data than to graphically present it.

Pie Graphs are circular presentations which are drawn by summing your data and computing the percent of the total for each data entry. These percent values are then converted to portions of a circle (by multiplying the percent by 360 ° ) and drawing the appropriate arc of a circle to represent the percent. By connecting the arc to the center point of the circle, the pie is divided into wedges, the size of which demonstrate the relative size of the data to the total. If one or more wedges are to be highlighted, that wedge can be drawn slightly out of the perimeter of the circle for what is referred to as an exploded view.

More typical of data presented in cell biology, however, are the line graph and the bar graph. There is no hard and fast rule for choosing between these graph types, except where the data is non-continous. Then, a bar graph must be used. In general, line graphs are used to demonstrate data which is related on a continous scale, whereas bar graphs are used to demonstrate discontinous or interval data.

Suppose, for example, that you decide to count the number of T-lymphocytes in four slices of tissue, one each from the thymus, Payer's patches, a lymph node and a healing wound on the skin. Let's label each of these as T, P, L and S respectively. The numbers obtained per cubic centimeter of each tissue are T=200, P=150, L=100 and S=50. Note that there is a rather nice linear decrease in the numbers if T is placed on the left of an x axis, and S to the right. A linear graph of this data would give a nice straight line, with a statistical regression fit and slope. But look at the data! There is no reason to place T (or P,L or S) to the right or left of any other point on the graph - the placement is totally arbitrary. A line graph for this data would be completely misleading since it would imply that there is a linear decrease from the thymus to a skin injury AND that there was some sort of quantitative relationship among the tissues. There is certainly a decrease, and a bar graph could demonstrate that fact, by arranging the tissue type on the x axis in such a way to demonstrate that relationship - but there is no inherent quantitative relationship between the tissue types which would force one and only one graphic display. Certainly, the thymus is not four times some value of skin (although the numbers are).!

However, were you to plot the number of lymphocytes with increasing distance from the point of a wound in the skin, an entirely different presentation would be called for. Distance is a continous variable. We may choose to collect the data in 1 mm intervals, or 1 cm. The range is continous from 0 to the limit of our measurements. That is we may wish to measure the value at 1 mm, or 1.2 mm or 1.23 mm or 1.23445 mm. The important point is that the 2 mm position is 2x the point at 1 mm. There is a linear relationship between the values to be placed on the x axis. Therefore a linear graph would be appropriate, with the dots connected by a single line. If we choose to ignore the 1.2 and 1.23 and round these down to a value of 1, then a bar graph would be more appropriate. This latter technique (dividing the data in appropriate intervals and plotting as a bar graph) is known as a Histogram. This graph is very familiar to students since it is the graph used for the display of grade distributions.

Having decided that the data has been collected as a continous series, and that the data will be plotted on a linear graph, there are still decisions to be made. Should the data be placed on the graph as individual points with no lines connecting them (a Scattergram)? Should a line be drawn between the points (known as a Dot-to-Dot)? Should the points be plotted, but curve smoothing be applied? If the latter, what type of smoothing?

There are many algorhythms for curve fitting, with the two most commonly used being linear regression and polynomial regression. It is important to decide BEFORE graphing the data, which of these is appropriate.

Linear regression is used when there is good reason to suspect a linear relationship within the data (as for example in a spectrophotometric standard following the Beer-Lambert law). In general, the y value can be calculated from the equation for a straight line, y = mx + b, where m is the slope and b is the y- intercept.

Computer programs for this can be very misleading. Any set of data can be entered into a program to calculate and plot linear regression. It is important that there be a valid reason for supposing linearity before using this function, however. This is also true when using polynomial regressions. This type of regression calculates an ideal curve based on quadratic equations with increasing exponential values, that is y = (mx+b) , where n is greater than 1. The mathematics of this can become quite complex, but often the graphic displays begin to look better to the beginning student. It is important to note that use of polynomial regression must be warranted by the relationship within the data, not by the individual drawing the graph.

For single sets of data, that is the extent of the available options. For multiple sets the options increase. If the multiple sets are data collected pertaining to identical ordinate values, then error bars (standard deviation or standard error of the means) can be added to the graphics. Plots can be made where two lines are drawn, connecting the highest y values for each x, and a second connecting the lowest values (the Hi-Lo Graph). The area between the two lines presents a graphic depiction of variability at each ordinate value.

If the data collected involves two or more sets of data having a common x axis, but varying y axes (or values), then a multiple graph may be used. The rules for graphing apply to each set of data, with the following provision; Keep the number of data sets on any single graph to an absolute minimum. It is far better to have three graphs, each with 3 lines (or bars) than to have a single graph with 9 lines. A graph that contains an excess of information (such as 9 lines) is usually ignored by the viewer (as are tables with extensive lists of data). For this same reason, all unnecessary clutter should be removed from the graph; e.g. grid marks on the graph are rarely useful.

Finally, it is possible to plot two variables, y and z, against a common value, x. This is done with a 3D graphic program. The rules for designing a graph follow for this type of graph, and the use of these should clearly be left to computer graphics program. These graphs often look appealing with their hills and valleys, but rarely impart any more information than two separate 2D graphs. Perhaps the main reason is that people are familiar with two dimensional graphics, but have a more difficult time visually interpreting three dimensional graphs.

Cell Biology Laboratory Manual

Dr. William H. Heidcamp, Biology Department, Gustavus Adolphus College,

St. Peter, MN 56082 -- cellab@gac.edu