打开APP
userphoto
未登录

开通VIP,畅享免费电子书等14项超值服

开通VIP
5.4 Graphical Methods, Histograms and Scatter Diagrams



6. Graphical Methods

Graphical methods of data analysis include box plots, stem and leaf plots, run charts or trend charts, scatter diagrams, histograms, normal probability plots, Weibull plots. Data constitute the foundation for statistical analysis. The best way to analyze data and measure a process is with the help of charts, graphs, or pictures. Charts and graphs are the most commonly used tools for displaying and analyzing data as they offer a quick and easy way to visualize what the data characteristics are. They show and compare changes and relationships. 

1. Box Plots combine information about the distribution of values, instead of plotting the actual values.The box plot helps to see the central tendency, and spread and variability of the data observations. (See topic: Process Analysis and Documentation in the previous section.)

2. Stem and Leaf Plots display the variation of histograms and it is useful for data sets (n<200) . (See topic: Process Analysis and Documentation in the previous section.)

3. Trend Charts/ Run Charts Trend charts (also known as run charts) are typically used to display different trends in data over time. A trend chart is actually a quality improvement technique and is used to monitor processes. A goal line is also added in the chart to define the target to be achieved. One of the main advantages this chart offers is that it helps in discovering patterns that occur over a period of time.

Uses of Trend Charts

Using trend charts can lead to improved process quality. 

Trend Charts should be used for introductory analysis of continuous data or data arranged in a time order. A trend chart of continuous data should be drawn before doing other analysis. Analysis of run charts is used to find out if the patterns in the data have developed because of common causes or special causes of variation. Answers to questions like “Was the process under statistical control for the observed period” are provided by the run chart. If the answer is no, then there must have been special causes of variation that affected the process. If the answer is yes, then process capability analysis can be used to approximate the long term performance of the process (See topic: Process Capability Analysis)

A run chart should not be used if more than 30% of the data numbers are the same. Also run charts are not very sensitive to SPC, they cannot detect single points which are characteristically different form others; hence they may not be able to detect special causes of variation in spite of their presence. 

The various steps involved in creating a trend chart are: 

Data gathering: The data should be collected over a period of time and it should be gathered in a chronological manner. The data collection can start at any point and end at any point. 

Data organizing: The collected data is then integrated and is divided into two sets of values, i.e., x and y. The values for ‘x-axis’ represent time, and the values for ‘y-axis’ represent the measurements taken from the source of operation. 

Preparing the chart: The y values versus the x values are plotted, using an appropriate scale that will make the points on the graph visible. Next, vertical lines for the x values are drawn to separate time intervals such as weeks. Horizontal lines are drawn to show where trends in the process, or in the operation, occur or will occur. 

Interpreting the chart: After preparing the chart, the data is interpreted and conclusions are drawn that will be beneficial to the process or operation. 

Example: Suppose you are the new manager in a company and you are disturbed by the trend of certain employees coming late. You have decided to monitor the employees’ punctuality over the next four weeks. You decided to note down by how much time they get late everyday (on an average basis) and then construct a trend chart.

Data Gathering: Cluster the data for each day over the next four weeks. Record the data in an ordered manner as shown in the following:




Organizing Data: Determine what should be the values on x-axis and what should be the values on y-axis. Assume day of the week on the x-axis and time on the y-axis.




Preparing the chart: Plot the y values versus the x values on a graph sheet (on paper) or using another computer tool like Excel or Minitab. Draw horizontal or vertical lines on the graph where trends or deviations occur.




Interpreting Data: Conclusions can be drawnonce the trend chart has been prepared. Results can then be interpreted by the analysts in the analysis phase. It is very clear from the chart above that employees usually take more time to reach office on Mondays.

4. Histograms

A pictorial representation of a set of data is known as a histogram. It is a vertical bar graph . This bar graph or Histogram very crisply displays the wanted information. It is constructed from a frequency table and thus is also called a Frequency Histogram. It depicts the distribution or variation of data over a range (range could be in terms of age, size, length, number etc.), such as dispersion, central tendency. It determines the shape of the data i.e. normal, bimodal, saw-toothed, cliff-like, and skewed and so on. 

The shapes of histograms vary depending on the choice of the size of the intervals. The horizontal axis depicts the range and scale of observations involved. The vertical axis shows the number of data points in various intervals, i.e., the frequency of observations in the intervals. The values on the horizontal axis are called the upper limits (intervals) of data points. 

Uses of a Histogram

A histogram makes it easy to see the scattering of data (the dispersion and central tendency) and thus it becomes clear where the variable occurs in a critical state. It makes comparison of the distribution to process requirements easy.

A histogram is a practical method to identify a distribution. In very large samples, the histogram will be close to the shape of distribution and it becomes easier to identify the population distribution in that case. 

Histograms are also used as quality control tool. It is used in the analysis and finding possible answers to quality control problems. But histograms should be drawn along with control charts or run charts because histograms do not display the ‘out of control’ processes as they do not show the time sequence of data. 

Histograms help in finding solutions for process improvement. When histograms from different time periods are compared, patterns in them can be studied for possible solutions. 

When data is obtained from different sources, the data can be stratified by plotting different histograms. 

Other uses of a histogram are listed below: 

  • To check whether the output of a process is normally distributed or not
  • To check whether the customer’s requirement can be met by the current process
  • To check for process change
  • To check the differences in outputs of multiple processes
  • To communicate data in a faster and easier way


A histogram is an efficient tool which can be used in the early phase of data analysis. For a better analysis, it is combined with the concept of normal curve. A few questions are generally used to interpret the histogram, which are, 

  • If the particular process is working within the stipulated specification limits?
  • If the process is exhibiting a wide variation?
  • And finally which appropriate action has to be taken?


Disadvantages of Histograms

At least 50 samples must be considered to represent a true behavior of a histogram. Though the histograms give information regarding the spread and distribution, they do not give adequate information regarding the state of control of the process. As the samples collected are not collected in any particular order, the time related trends of the process being studied is not depicted. 

Histograms are an important tool in the initial phase of data analysis due to the ease with which it can be created. But in statistical process control, the histogram does not give any clue regarding how the process was operating at the time of data collection. 

In the example discussed previously about employees who come late, the histogram can show how the data is dispersed (on a daily basis) for the duration of a month:




5. Scatter Diagrams

(See Chapter 6- Six Sigma, Analyze) 

6. Probability Plots

Probability plots are a graphical technique to check which distribution (e.g. normal, Weibull etc.) a particular data set is following. This technique is used to verify the collected data against any known distribution. A probability plot shows the probability of a certain event occurring at different places within a given time period. Each sample is selected in such a manner that each event within the sample space has a known chance of being selected. While sampling for any event, every observation from which the sample is drawn has a known probability of being selected into the sample. 

Probability plots give a better insight into the physical environment of a process. With moderately small samples, probability plots produce reasonable results. Probability plots show estimates of process yields. 

Probability plots, also known as Probability Sampling is usually estimated on a scale from 0 to 1. Any event which is most likely to occur will have a probability nearest to 1. Any event which is least likely to occur will have a probability nearest to 0. 

When plotted on a graph, these events usually bunch around the mean, which occurs in a Bell curve (See topic: Basic Process Capability). This theoretical distribution of events allows the calculation of the probability of a certain event occurring in the sample space. 

Interpretation of Probability Plots

A probability plot consists of a center line and two outer bands, one above the center line and one below it. The nearer the data points are to the center or middle line, the better it is thought to fit the distribution. If all the points lie within the two outer bands then the data set is thought to be a good fit to the probability model being used. 

A straight line in a probability plot indicates that the data set is following that particular distribution. But a bend in the plot suggests that the data set is from more than two distributions. 

The positive aspect of a probability plot is that the data need not to be divided into intervals. Also probability plots works better for a smaller number of data points. 

On the other side, probability plots need to use the correct probability distribution. 

7. Goodness of Fit Tests

Goodness of Fit test is a type of statistical test where the legitimacy of one hypothesis is tested without the specification of an alternative hypothesis.

The procedure for such a test is,

1. To define a test statistic (some function of data measuring the distance between the hypothesis and data) and 
2. To calculate the probability of obtaining data which have a still larger value of this test statistic than the value observed, assuming the hypothesis is true. 

The result obtained is known as the size of test or the confidence level. 

Probabilities which are less than 1% show a poor fit. Probabilities which are close to 1% indicate a fit which is too good to occur very frequently and may be a sign of error. 

The most common tests for goodness-of-fit are chi square test, Kolmogorov test, Cramer-Smirnov-Von-Mises test, runs etc. 

The Pearsonian chi square test is used to test if an observed distribution conforms to any other distribution. The method consists of organizing the observations into frequency table with classes. The formulae is,




The number of degrees of freedom is p ? 1 

Here, p = the number of parameters estimated from the data 

Kolmogorov-Smirnov test is used to test the sample for distributional adequacy. This test is used to determine if the particular sample being studied is from a population with a specific distribution. This test comes with its characteristics and limitations. It does not depend upon the cumulative distribution function being tested. Unlike chi-square, goodness-of-fit test is an exact measure. 

(For more details on Goodness of Fit Tests, refer to chapter 6: Black Belt, Analyze)
本站仅提供存储服务,所有内容均由用户发布,如发现有害或侵权内容,请点击举报
打开APP,阅读全文并永久保存 查看更多类似文章
猜你喜欢
类似文章
【热】打开小程序,算一算2024你的财运
Bar Chart vs. Histogram: Key Differences and Similarities | Indeed.com
Histogram and density plot
Polar plot
Seurat单细胞基因显著性检验函数及批量添加显著性
“BMJ统计问题”系列来了。每次一个小问题,配备中英文解释
Create Infographics, Charts and Maps | Infogr.am
更多类似文章 >>
生活服务
热点新闻
分享 收藏 导长图 关注 下载文章
绑定账号成功
后续可登录账号畅享VIP特权!
如果VIP功能使用有故障,
可点击这里联系客服!

联系客服