pctower

05-16-2004, 01:05 PM

The term "null hypothesis" has been thrown around a lot on this site, as if it is some mysterious term that demonstrates that it hasn't been proved that cables can produce different audible effects. People also frequently refer to double blind tests that supposedly demonstrated that the people who participated in the tests could not tell the difference under blind tests of any of the cables tested.

As an aside, I would suggest that in order to assess whether these test results are reliable one must be able to asset the protocol that was followed. In other words, exactly how were the tests conducted. I have yet to see a reported cable DBT provide that level of detail.

However, the main point of this post is to suggest that in evaluating the reliability or significance of any test results it is critical to know the detail of the raw data that resulted from such test and the statistical analysis that was applied to that raw data. Sometimes the actual raw data is not even presented in the test report. When it is presented, invariable the tests have involved a fairly low number of actual trials.

I'm no statistician by any means. But here's some information I believe is important.

Null Hyothesis: "The null hypothesis is a term that statisticians often use to indicate the statistical hypothesis tested. The purpose of most statistical tests, is to determine if the obtained results provide a reason to reject the hypothesis that they are merely a product of chance factors. For example, in an experiment in which two groups of randomly selected subjects have received different treatments and have yielded different means, it is always necessary to ask if the difference between the obtained means is among the differences that would be expected to occure by chance whenever two groups are randomly selected. In this example, the hypothesis tested is that the two samples are from populations with the same mean. Another way to say this is to assert that the investigator tests the null hypothesis that the difference between the means of the populations from which the samples were drawn, is zero. If the difference between the means of the samples is among those that would occur rarely by chance when the null hypothesis is true, the null hypothesis is rejected and the investigator describes the results as statistically significant."

See: http://www.animatedsoftware.com/statglos/sgnullhy.htm

So the null hypothesis is nothing mysterious. It simply is conceptual framework for testing a hypothesis. In our case, the hypothesis might be that the two cables under test in a DBT produce audibly different results. Presumably, to test that hypothesis we would prevent the listeners from seeing or knowing which cable they are testing on any given trial. If they identify when an actual change of cables has been made correctly enough times to reasonably eliminate the conclusion that the result of their "identifications" are due merely to chance then we have a test that would support the hypothesis that the two cables do produce different audible results.

The trick is knowing who many trials need to be run and how many correct guesses are required to eliminate chance. This is not a simple matter.

I recommend the following site which attempts to discuss these issues in lay terms:

http://www.psychstat.smsu.edu/introbook/sbk00.htm

If you can wade through the introductory sections, I recommend the following section as particularly relevant to the type of DBTs that are often discussed at this site: http://www.psychstat.smsu.edu/introbook/sbk26.htm

Note that in this particular example the results in the second year might also have been different from the first year if the size of alpha had remained constant from the first year, but the number of trials had been increased, as increasing the number of trials would have been another way to decrease Type II errors.

(Note: for more detailed discussion on alpha and beta follow the links on the first link I posted above)

I present this material merely in an effort to try to idicate that if we really want to get "scientific" about blind testing, then we must realize that the statistical analysis is very important to determining the reliability of the tests we're looking at. If someone makes reference to the results of particular cable DBTs, I think it entirely appropriate for him to be challenged to provide detail on the raw data and the statistical analysis that was applied. If he cannot I suggest that the test cannot be considered scientifically reliable. If he can produce the detail, then it must be analyzed carefully in order to decide if the test results are reliable.

As an aside, I would suggest that in order to assess whether these test results are reliable one must be able to asset the protocol that was followed. In other words, exactly how were the tests conducted. I have yet to see a reported cable DBT provide that level of detail.

However, the main point of this post is to suggest that in evaluating the reliability or significance of any test results it is critical to know the detail of the raw data that resulted from such test and the statistical analysis that was applied to that raw data. Sometimes the actual raw data is not even presented in the test report. When it is presented, invariable the tests have involved a fairly low number of actual trials.

I'm no statistician by any means. But here's some information I believe is important.

Null Hyothesis: "The null hypothesis is a term that statisticians often use to indicate the statistical hypothesis tested. The purpose of most statistical tests, is to determine if the obtained results provide a reason to reject the hypothesis that they are merely a product of chance factors. For example, in an experiment in which two groups of randomly selected subjects have received different treatments and have yielded different means, it is always necessary to ask if the difference between the obtained means is among the differences that would be expected to occure by chance whenever two groups are randomly selected. In this example, the hypothesis tested is that the two samples are from populations with the same mean. Another way to say this is to assert that the investigator tests the null hypothesis that the difference between the means of the populations from which the samples were drawn, is zero. If the difference between the means of the samples is among those that would occur rarely by chance when the null hypothesis is true, the null hypothesis is rejected and the investigator describes the results as statistically significant."

See: http://www.animatedsoftware.com/statglos/sgnullhy.htm

So the null hypothesis is nothing mysterious. It simply is conceptual framework for testing a hypothesis. In our case, the hypothesis might be that the two cables under test in a DBT produce audibly different results. Presumably, to test that hypothesis we would prevent the listeners from seeing or knowing which cable they are testing on any given trial. If they identify when an actual change of cables has been made correctly enough times to reasonably eliminate the conclusion that the result of their "identifications" are due merely to chance then we have a test that would support the hypothesis that the two cables do produce different audible results.

The trick is knowing who many trials need to be run and how many correct guesses are required to eliminate chance. This is not a simple matter.

I recommend the following site which attempts to discuss these issues in lay terms:

http://www.psychstat.smsu.edu/introbook/sbk00.htm

If you can wade through the introductory sections, I recommend the following section as particularly relevant to the type of DBTs that are often discussed at this site: http://www.psychstat.smsu.edu/introbook/sbk26.htm

Note that in this particular example the results in the second year might also have been different from the first year if the size of alpha had remained constant from the first year, but the number of trials had been increased, as increasing the number of trials would have been another way to decrease Type II errors.

(Note: for more detailed discussion on alpha and beta follow the links on the first link I posted above)

I present this material merely in an effort to try to idicate that if we really want to get "scientific" about blind testing, then we must realize that the statistical analysis is very important to determining the reliability of the tests we're looking at. If someone makes reference to the results of particular cable DBTs, I think it entirely appropriate for him to be challenged to provide detail on the raw data and the statistical analysis that was applied. If he cannot I suggest that the test cannot be considered scientifically reliable. If he can produce the detail, then it must be analyzed carefully in order to decide if the test results are reliable.