Interegg Correlations

Peter Bancel has been exploring the GCP/EGG data from September 11, 2001. This a work in progress, but the early results are interesting and thought provoking. First an extract from an email of Sept 26 in which he gives a general description of his methods. Following that is the draft report he sent a day later. Of special note is the graph of autocorrelation of Z-scores between 08:00 and 12:00, the four-hour period that contains the major events of the terrorist attack. A more in-depth explanation of the procedures follows that.

"Basically, what I do is do an autocorrelation of the sec-by-sec Stouffer z's over regs, using Fourier techniques. The resulting autocorrelations are then normalized to the Sqrt (of the number of data pts-the lag). This gives a distribution of "z" values that should be very closely N(0,1) distributed. I then visualize the result by taking the cumulative sum, much as we do for a classic reg experiment. It is interesting to look for a positive sum at short times(ie, a few minutes)-indicating perhaps an overall correlation in the data-as well as the behavior at lag times of up to two hours, which should be sensitive to correlations resulting from successive "impulses" due to the sub-events (ie, wtc hit 1, wtc hit 2, pentagon hit, wtc collapses 1 & 2)."

Draft Report

Here is a summary of some of the autocorrelation analysis that I've been doing on the GCP data from September 11. It is rough and preliminary, intended to give a flavor of this approach to the data. Enjoy.
-Peter B.

September 11 data. Cumulative Sum of the second by second autocorrelation using Stouffer z scores calculated across regs. The plot extends to a lag time of 10800 seconds (3 hrs).One data point corresponds to a 5 second lag. The red curves are .05 probability envelopes.

The Event. This autocorrelation uses a 12 hour window around the disaster (7am to 7pm EDT). The window excludes any data before 7am.One possibility in exploring autocorrelations is to get some insight into time structure in the data that corresponds to different stages of the event. Two things to note in the plot are the strong rising trend out to about 1100 on the plot. This suggests a relatively persistent positive autocorrelation in the data for a period of at least 92 minutes (one data pt is 5 secs), which roughly corresponds to the length of the disaster event. After a lag of 92 minutes the autocorrelation sum falls off and the descent is fairly smooth. Second, the rising portion contains visibly more structure than the later part of the plot. In particular, there seem to be 'impulses' at lag times of roughly 22.5, 42, 58 and 83 minutes (270, 500, 700, 1000, on the plot).

[RDN note: See also the four-hour series in the set of graphs at the end of the page. They look at the day in chunks of four hours of data. The graph for 8:00 to 12:00 EDT is exceptional, showing a steady positive cumulative autocorrelation which suggests that the eggs were all subject to a common influence that resulted in a tendency to similar behavior.]

To look at the structure more closely , I focus on the four sub-events that were broadcast live, since these (as a working hypothesis) are likely to provide the strongest "impulse" registered by the reg network.The events are the second strike on the WTC , the pentagon strike and the collapses of towers 1 and 2.These occured at 9:03,9:43,10:05 and 10:28, according to the European press. These four events define six lag times of 22,23,40,45,62 and 85minutes. If you look at the autocorrelation sum you can see that these times correspond roughly to regions where the sum makes a marked upward movement.

To see this more clearly I fit an analytic trig series to the data and differentiate. The series is truncated to get a smoothing effect. The differentiated function has peaks where the cumulative sum of the autocorrelation has a strong rising slope. The figure below shows the positive peaks of the fit for the 12 hour window from 7am to 7pm overlaid with the six lag times mentioned above. The result looks pretty striking.

However, things are not as simple as they seem. I've looked at successive 8 hour windows to try to zero in on the effect. Oddly, the close match of autocorrelation derivative peaks to the empirical lag times is maintained even for later windows that exclude the event (ie 8 hour data windows from (11am - 7pm) out to (1pm - 9pm). Below is the result for 8 hours of data windowed from 11am to 7pm. (The x-axis is now in minutes of lag time; multiply by 12 to convert to the scale of the fig. above). You can see that removing the portion of the raw data around the event (7-11am) has not altered the situation much.

So the match of autocorrelation structure with the timing of events may be a fluke. A couple of more plots make for a cautionary tale. Here is the plot if I take the 8 hour window from 7am to 3pm, that is, including the event. The matches are not nearly as good and some extra peaks occur.

For the window from midnight to 8 am there is a mix of matches and misses:

So caution is the byword here. One could also quibble about the timing of events (the pentagon strike in particular- is it better to use the moment of the crash or the time the news broke on the networks a few minutes later?).Nevertheless, if there is any validity to this approach, it might be that 'impulses' have long tails so that fine scale correlations in the data are observable even for windows that exclude the event, in a temporal sense. Perhaps the windows centered on the event itself contain correlations from both the event and prior, premonitory influences. Windows just beyond the event might be cleanly registering the tails of the event in a region where more distant "pre-event" correlations have nearly completely decayed. I will be playing with this more. The first thing is to look at data from a few days before the 11th.

To finish, I've plotted below the cumulative sums of autocorrelations for a selection of windows. It is particularly interesting to contemplate the series of 4-hour windows.The lag times are in seconds. The 4-hour window autocorrelations are extended out to 4000 secs = 67 minutes. This is pushing things a bit for the window length of only 14,400 secs but the plots are pretty...

Detailed Explanation of Procedures

Date: Sun, 30 Sep 2001 09:40:57 +0100
Subject: Re: Draft report
From: Peter Bancel 

I have a new draft which addresses some of the questions you asked. The
large rise in the autocor 8-12 figure can be understood as coming from a
large excusion in the cumulative deviation of the z-scores (sec-by-sec 
Stouffer's z - not the z^2) which occurs from 9:50 to 11:50. This 
positive excursion as an isolated data set has a two-tailed-pvalue of 
2 x 10-4 (z=3.71). So it's strong and it lasts for 1.9 hours. Placed 
in the context of a 24 hours data window I guess a Bonferroni correction 
would put the pval at 2.5 x 10-3  (*see below).

Since the autocorrelation I examine is a self convolution of the cumulative 
deviation of the mean, broad features in the one lead to broad features
in the other. I don't have a better way to say it yet, but the new draft
report has a couple of figures that make the point quite nicely.

I also look at separating fine structure (ie, the plots of the
differentiated autocorr) and the gross structure (big autocorr rise around
8-12). I think these can be separated in analysis. Again, there are some
plots to show this.

This point is important in the light of potential outliers in the
sec-by-sec data (I wish I had a good name for this. I mean the dataset
constructed by calculating Stouffer z's across regs for each second). There
are three outliers of note in the 24 hour period of Sept 11: at seconds
10009,16504 and 36767. The last one falls at 10:12:47 edt, and has a z-score
of 4.8 Removing this one second of data would significantly change some of
the analysis that has been done, so that's food for thought. (the "heroic
struggle" analysis certainly, but also windowed analyses of short window
length). Ditto for the other outliers which occur at 2:46:49 and 4:33:25,
repectively. The large Stouffer z-score at 36767 occurs because nearly all
the 36 regs give a positive trial deviation for that one second; there do
*not* seem to be isolated regs that are misbehaving. I haven't yet looked in
detail at the other outliers, but they have z-scores of around 4.6 and 4.4.

To get back to your question about a z-score for the 8-12 analysis, I'm
not sure, but maybe we can come up with a reasonable approach. But the same
could be addressed perhaps more easily regarding the cumdev of the mean of
the sec-by-sec data and its excursion from 9:55 to 11:50 am.

cheers,

-Peter

(*)[I've never managed to find an explanation of Bonferonni's correction 
in the textbooks so I'm only guessing that I multiply the pval of the 
isolated event by 24/1.9]

[Peter was, in part, responding to these questions:]

> What do you think of the 8-12 figure that has a big, steady
> departure?  In the context of multiple analysis, or data
> snooping, what sort of correction factor would need to be
> applied to the apparent significance?  What is the Z-score?
> Given that it corresponds closely to the formal prediction of
> 8:35 to 12 (which should have been 8:25 to 12) what would you
> say its evidentiary or confirmatory value is?  I would like
> to put that fig on the main Sept 11 page to represent another
> independent analyst's look at the attack, and to provide a
> good place for a link to your full report.