.
             return to EL Knife

return to EL Home

                          TM
EL Knife          v1.1

A spreadsheet utility for reconfiguring multidimensional data files

The familiar spreadsheet stores data in two dimensions: rows and columns. To accommodate more complex designs, there are standard conventions for fitting multiple data dimensions into a single spreadsheet dimension. However, as there is always more than one way to configure such a data set, researchers sometimes have to reconfigure the data into a different format required for a particular analysis. This can happen when data generated through the use of software are not correctly configured or when data have been entered into the computer before the design of the analysis is finalized. More commonly, this problem arises when using two different statistical packages to perform different analyses, or when a particular graph is needed. Changing the configuration is often done by manually re-entering the data, cut and paste editing, or by an ad hoc program, all of which carry the risk of introducing errors. EL Knife automates the reconfiguration process. It loads data into a virtual multidimensional matrix and presents the user with an interactive configuration map that can be used to specify any possible spreadsheet configuration. In short, it gives a researcher the freedom to acquire or enter data in whatever form is most convenient, without worrying about how it is to be used.

Data may be divisible by up to 4 factors, each of which may have an unlimited number of levels. In most data sets, case (subjects) uses one of these four dimensions. Data must be either numbers containing 1 to 10 digits or single text characters. The two types cannot be mixed. Multi-character text (words) can be used as factor names and level labels (but not as data)

Files must be tab separated spreadsheets saved as plain text files. They can be created with spreadsheet software such as Microsoft Excel or most statistical programs. Even a word processor will work if you hit the tab key once after each entry. Remember to save the file as plain text rather than in the program's proprietary format.


Using EL Knife

1. Attach a header. To properly parse the configuration of the input file, EL Knife needs a header in the first row to tell the program what to look for in each column. Look at each of the examples below to learn how to write a valid header. Formal specifications can be found under Header Rules in the Technical Details section at the end of this document.

2. Open and Verify the File. Open the file with EL Knife and look at the configuration map to see if the file has been properly interpreted. The map should show a marker block for each factor. The block shows the number of levels followed by the factor name. To view the data file directly, choose View datfile from the File menu.

If the map is not as expected, there may be a problem with the header or the data. .For example, if EL Knife cannot find a header or doesn't understand it, you will see just two blocks representing the rows and columns. If the columns are not separated by tabs, you will see a blank map. If you see extra, unexpected blocks, you may have an accidental duplication in the file or perhaps two subjects with the same identifier.

3. Change the configuration. To change the way that the data factors are packed into the two dimensional spreadsheet, simply click on factor blocks and drag them to different locations on the map. Or click in the small square at the left end of a block to tun it off. There are three types of changes possible:

   

a. Moving factors from rows to columns (or vice versa.) A simple example would be turning the rows into columns and the columns into rows, rotating the spreadsheet 90 degrees (see example 1).

b. Changing the hierarchical order of factors. If there are multiple factors in either the row or column dimension, the one listed first will be the one whose levels change most slowly as you move across or down the spreadsheet. Drag the blocks to arrange them whichever order you wish.

c. collapsing factors. If you click on the close box at the left end of a factor block, the block will turn white and the program will collapse that factor, averaging together to single values, all sets of data points that share the same levels of all factors still active.

4. Write the result to an output file. The name to be used for the output file is shown in the menu bar. Click on the name to edit it. EL Knife goes to considerable effort to prevent using the same name as an existing file.(See Output File Names, below.) Click on the OK button in the menu bar. The program will write a new file and turn off the button. (On some systems it beeps when done.) You should then choose View Outfile from the File menu and make sure the new file meets your needs.


Examples

Example 1. a simple 2 dimensional spreadsheet with data from 3 participants, each of which have been tested 4 times. Note that the data columns are headed with a factor name (trial) followed by a level label (1-4). The column that contains subject labels (1-3) is headed with just a factor name (Subj)

Subj
1
2
3
trial1
2.31
3.55
2.7
trial2
2.52
2.61
2.45
trial3
2.01
2.7
2.52
trial4
1.97
2.43
2.04

EL Knife will read the file and show the following configuration map.

To interchange the row and column dimensions, drag the subject block from the row grid up to the column grid, and drag the trial block down to the row grid like so:

Click on OK, then choose View outfile from the File menu . . .

. . . to see the reconfigured output spreadsheet.

trial
1
2
3
4
Subj1
2.31
2.52
2.01
1.97
Subj2
3.55
2.61
2.7
2.43
Subj3
2.7
2.45
2.52
2.04

Another possibility: If you drag the Subj block down to the second row position,

You will get a spreadsheet where both factors change as you move down the rows and there is only a single column of data. Note that the trial factor, placed in the first row position, changes levels more slowly than the subj factor. (If there is only a single column of actual data and it is the last column on the right, its header may sometimes be omitted. See Header Rule #8 below.)

trial
1
1
1
2
2
2
3
3
3
4
4
4
Subj
1
2
3
1
2
3
1
2
3
1
2
3

2.31
3.55
2.7
2.52
2.61
2.45
2.01
2.7
2.52
1.97
2.43
2.04

Example 2. a 3 dimensional data set with 12 subjects (2 groups x 2 genders x 3 subjects/group) Note that the columns which carry labels for the three factors are headed by factor names while the data column is headed by a factor name (Dat) followed by a level label (1).

Group
1
1
1
1
1
1
2
2
2
2
2
2
Gend
M
M
M
F
F
F
M
M
M
F
F
F
Subj
BB
WK
JM
CK
RB
CH
TB
RW
SC
KR
BC
JH
Dat1
2.34
2.21
2.84
2.97
2.88
2.93
1.63
2.14
2.21
1.81
2.01
1.38

Note that this file will be loaded into a 2 x 2 x12 matrix that wastes memory with 3/4 of its cells empty. This is because the program is reserving cells for all possible combinations of the levels of the three factors but each subject belongs to only one group and one gender.

In order to use a more efficient 2 x 2 x 3 matrix, you should delete the column which holds the unique subject ID codes. Or you can cause the program to ignore it by placing an ! exclamation point as the first character of the column's header.

Group
1
1
1
1
1
1
2
2
2
2
2
2
Gend
M
M
M
F
F
F
M
M
M
F
F
F
!Subj
BB
WK
JM
CK
RB
CH
TB
RW
SC
KR
BC
JH
Dat1
2.34
2.21
2.84
2.97
2.88
2.93
1.63
2.14
2.21
1.81
2.01
1.38

When El Knife opens this file, it will detect the presence of the unlabeled factor (subjects). It does this by noticing, for example, that the combination Group 1 Gend M occurs 3 times. In order to differentiate these repetitions, it creates a new factor. Of course, it doesn't know what the factor should be named or what labels to use for its 3 levels, so it calls it 'RowFactor?' and numbers the levels in order of appearance i.e., the first time Group 1 Gend M appears, it is labeled 1, the second time is labeled 2, etc. (For more on this, See Automatic Detection of Unlabeled Factors, below)

To change a factor name, doubleclick and edit where it appears under the menubar

If you need each group's data in a separate column, re-arrange the factors like so:

Note that the output file header (below) abbreviates the factor names to prevent excessive column width. If you don't want this abbreviation, switch to Full factor names (Option menu) before clicking on the ok button.

Note also that level labels in the header are in parentheses. This is because the Gender labels (M, F) are not numbers and need to be separated from the factor name

Subjects
1
2
3
Gr(1)Ge(M)
2.34
2.21
2.84
Gr(1)Ge(F)
2.97
2.88
2.93
Gr(2)Ge(M)
1.63
2.14
2..21
Gr(2)Ge(F)
2.81
2.01
1.38

To Collapse a factor: click in the small square at the right of the block. When a block is turned off, the program collapses a dimension of the data matrix by averaging together sets of cells differentiated only by having different levels of that factor.

This configuration with Subjects turned off produces an output file with a 2x2 table of means for each of the four groups.

Gend
M
F
Group1
2.46
2.93
Group2
1.99
2.07


Example 3. An experiment wherein 2 different stimuli (A,B) are presented to each subject twice (Trials 1,2). There are 6 subjects, divided into 2 groups (control,experimental). (2 groups x 3 subjects/group x 2 stimuli x 2 trials )

Group
c
c
c
e
e
e
Stim(A)Trial(1)
2.2
2.4
2
2.3
2.7
2.6
Stim(A)Trial(2)
2.4
2.7
2.6
2.6
2.5
2.8
Stim(B)Trial(1)
1.9
2.2
2.2
2.1
2.2
2.4
Stim(B)Trial(2)
2.1
2.5
2.8
2.7
2.9
2.5

The program detects the unlabeled subject factor. (Double click on the Rowfactor? block and change the name to Subj.)

There are several hundred different output configurations possible for this file. Here are just three:

First, collapse trials to highlight the interaction between Group and Stimuli.

Group
c
c
c
e
e
e
Subj
1
2
3
1
2
3
Stim(A)
2.3
2.55
2.3
2.45
2.6
2.7
Stim(B)
2
2.35
2.5
2.4
2.55
2.45

Second, collapse both trial and stimuli to isolate the effect of the group factor.

Group
c
c
c
e
e
e
Subj
1
2
3
1
2
3

2.15
2.45
2.4
2.43
2.58
2.58

Finally, change Group from a row factor to a column factor, so that each of the 8 experimental conditions will be in a separate column.

Subj
1
2
3
G(c)S(A)T(1)
2.2
2.4
2
G(c)S(A)T(2)
2.4
2.7
2.6
G(c)S(B)T(1)
1.9
2.2
2.2
G(c)S(B)T(2)
2.1
2.5
2.8
G(e)S(A)T(1)
2.3
2.7
2.6
G(e)S(A)T(2)
2.6
2.5
2.8
G(e)S(B)T(1)
2.1
2.2
2.4
G(e)S(B)T(2)
2.7
2.9
2.5


Other controls

Decimal places

By default, EL Knife rounds all numbers to 2 decimal places. To change this, choose decimal places from the Option menu and select any value for 0 to 9. Keep in mind, however, that the program is limited to numbers with 10 digits. For all data values, the number of digits to the left of the decimal point plus the number of decimal places selected for the output file must not exceed 10. This is true even if not all the decimal places are actually being used. For example if you set decimal places to 6, the number 34451.1 will cause an overload error. If an overload occurs, the affected cell will be filled with a row of ****** and a warning message will appear just below the menu bar.


Ignore rows or columns of the input file.

To ignore a column, place an exclamation point ! as the first character of that column's header. To ignore a row, place an ! as the first character of the row. The top row of the file cannot be ignored.


Output file names

EL Knife will take the name of the input data file and add .out to it to make a name for the output file. This is shown in the menu bar. If the input file already ends in .out, then it will replace the t with a 2 to get .ou2. If this name already exists in the current directory, it will try .ou3, .ou4 etc. If it gets to .ou9 and still doesn't find an unused name, it will warn you that "file already exists." You can click on the outfile name and edit it but the name you choose must end in .out or .ou followed by a single digit. If you want to output a second version of the same data in the same session, you must change the name in some way for or else the program will assume you want to overwrite the earlier version.


Level labels: text vs. order numbers

El Knife treats the labels for the levels of all factors as if they were just text labels, whether or not they are composed of letters, numbers or both. In other words, it will not use numbered labels to sort the data into order. Instead the levels are assigned positions in the data matrix in order of first appearance as the spreadsheet is loaded. In other words the first subject encountered gets first position even if he/she is labeled subj13.

There are situations in which you may want to eliminate the given labels. For example, some statistical programs cannot open files that contain non-numeric characters. Or you might want to remove subject names to protect confidentiality. Switch from Text labels to Order labels by choosing Level Labels from the Option menu. The program will then replace all level labels with numbers denoting order of first appearance. And all level labels will now be in consecutive order. (Remember however, that if the original labels were numbers based on some other principle than order of appearance, the labels will change in a non-obvious and possibly confusing manner.)


Suppress Header

The Include Header command in the Options menu allows you to turn the header off and on. When set to NO, Output files will be written without the header. This is for use with statistical software that won't load files containing non-numeric characters. It is also useful when combining a number of separate files together (see next item, below).


Combining a number of separate files together

If you have a series of separate data files that need to be reconfigured and combined into a single file, you may be able to accomplish both with a single procedure. Open the first file, arrange the desired configuration and click on OK to create the output file. Then, switch outfile from Replace to Add (outfile menu). Then open each successive file, click on the outfile name and change it to the first outfile name before clicking on OK. The new data will be added to the bottom of the file. You can choose View Outfile from the File menu to make sure it is working as intended.

The add function always adds data to the bottom of the file, creating additional rows to hold it. It will not add by creating new columns. Remember, however, that you can add data as rows and later reconfigure it so that these rows become columns.

Be cautious with the Add option, as it will allow you to change pre-existing files.




Technical Details



Requirements

Macintosh with System 7, 8, 9, or OS X Classic; PC with DOS, Windows 3.1, 95, 98, Millenium, or XP home. (It has not been tested with NT, 2000, or XP pro)



Cross platform operation

The Macintosh and PC versions of EL Knife will open each other's files. They should also open any plain text file created on the other platform.



Header Rules

1. A column that contains data (dependent variable) should be headed by factor name(s) followed by level label(s).

2. A column that contains level labels should be headed by their factor name.

3. Factor names should consist of letters with no numeric characters.

4. Level labels may consist of any characters, but if any contain letters or other non-numeric characters, then all must be enclosed in (parentheses) to separate them from factor names.

subject
AD
DF
trial1
2.3
3.0
trial2
2.6
3.1

                  vs.                  

subject
AD
DF
trial(a)
2.3
3.0
trial(b)
2.6
3.1

5. If it is absolutely necessary to use factor names that contain numbers, this can be done only if all factor names are followed by () wherever they are used, even if they are not followed by level labels.

subject5()
AD
DF
trial(1)
2.3
3.0
trial(2)
2.6
3.1

6. When there are two or more column factors, the factor name and level label for each must appear at the head of each data column

subject
1
2
Stim(AA)trial(1)
224
231
Stim(AA)trial(2)
251
244
Stim(BB)trial(1)
213
220
Stim(BB)trial(2)
289
270

7. When there are two or more row factors, each must be identified by a separate column containing its level labels and headed with its factor name

Group
exper
exper
exper
control
control
control
Subject
1
2
3
1
2
3

2.2
2.3
1.9
2.3
2.7
2.5

Note: you cannot combine two designations in the same column. The following will not be properly interpreted.

Group
ex1
ex2
ex3
con1
con2
con3

2.2
2.3
1.9
2.3
2.7
2.5

8. You can omit the header for a single column of data if it is the last column on the right. This is because a single column is not really a factor and does not need a factor name or level label.

s
1
2
3
4
gender
M
F
F
M
group
1
1
2
2

25.2
20.3
12.1
11.4

However, if the file contains another column that has been blocked out with an exclamation point or there are additional columns to the right of the data column, then the program may have trouble finding the data column unless it has a factor name and level label, e.g. Dat1

s
1
2
3
4
!gender
M
F
F
M
group
1
1
2
2
Dat1
25.2
20.3
12.1
11.4
!comments
ok
recheck
ok
ok


Automatic Detection of Unlabeled Factors

If the program encounters multiple rows (or cols) with the same combination of factor levels, it assumes that there is another unlabeled factor needed to differentiate the rows. It creates a new factor, called RowFactor? and assigns numbered level labels in order of appearance. Thus the first time a combination appeared, it would get a 1, the second time a 2. EL Knife will detect up to two unlabeled factors in a spreadsheet, one in the rows and one in columns.

In the row dimension, it will detect an unlabeled factor even if there are no labeled factors. This comes in handy when you are loading a file where each case takes a single row in the spreadsheet and there is no case-identifying factor. The program will create a factor and number the cases in order of appearance.

For example, it will read this file as having 3 RowFactor? and 4 tr. ( RowFactor? should be changed to a more meaningful name e.g. Case or Subj)

tr1
12
10
9
tr2
13
9
10
tr3
11
12
10
tr4
15
8
9

However, in the column dimension, it will detect an unlabeled factor only if there is at least one labeled factor which has more than one level.
So It will read this as having 3 Subj, 2 D, and 3 ColFactor?

Subj
1
2
3
D1
7
3
2
D1
3
5
4
D1
5
3
2
D2
4
5
3
D2
3
3
4
D2
4
6
7

but will fail to properly understand either of these . . .

Subj
1
2
3

5
4
6

4
4
5

6
3
8
                Subj
1
2
3
D1
5
4
6
D1
4
4
5
D1
6
3
8

Duplication Any duplication such as repetition of a trial or having two subjects with the same subject code will cause El Knife to auto-detect an extraneous row or column factor. This can double the size of the data matrix (with mostly empty cells.) Always check for such duplications before opening a file.


Miscellaneous Caution: Factors with only 1 level. Never include a factor that spans multiple columns or rows but has only one level. This chokes the system, causing the file to load incompletely, ignoring data or even entire factors. If you have factors like this. Delete or ! them before loading.

For example, this file will not read properly unless the first column is deleted

gr
1
1
1
dat1
4
3
6
dat2
6
4
8

It is, however, ok to have a single level for a column factor when there is only one column of data. This file will load, showing: 2 levels of gr and 3 levels of an unlabeled RowFactor?

gr
1
1
1
2
2
2
dat1
4
3
6
3
2
2

Finally, a single column of data will not load because there must be at least one labeled factor with more than level.

So this won't load . . .                but this will.


4.1
3.3
5.4
3.2
5.1
                             S
1
2
3
4
5

4.1
3.3
5.4
3.2
5.1



Order of columns and rows

The examples given above have factors and their various levels packed into the columns and rows in neat hierarchical order. However, the matrix location to which each data value is written is determined by the label, so that changing the order of columns and rows in the spreadsheet will not cause misplacement. (It may affect which levels of a particular factor occur first and second and are tagged level 1, level 2, etc.





License Agreement & Disclaimer


Users of EL Knife software must agree to the following conditions:

1. The author provides this software "as is" with no explicit or implied warranty. It is NOT guaranteed to be free of bugs. The user is responsible for testing to determine if the software is accurate and safe for the user's intended purpose. Each user must agree that the author is not liable for any damage or loss experienced through the use of EL Knife.    ALWAYS BACK UP DATA FILES BEFORE PERFORMING ANY TRANSFORMATIONS

2. Each user may download one copy of EL Knife for personal use, but NOT for further copying, placement on public or private servers, or any other form of distribution. (Institutions wishing to distribute El Knife to their personnel may contact the author to obtain a site license.)

EL Knife (TM) software and documentation are copyright 2000 by W. K. Beagley The author reserves all rights. EL Knife is NOT shareware, freeware, or public domain software.


Source and Support

Read more about EL Knife in:   Beagley, W. K. (2001). Why we need more psychology programmers/ EL Knife, a data utility for transforming spreadsheets. (Presidential address to the Society for Computers in Psychology, New Orleans, Nov. 2000) Behavior Research Methods, Instruments, & Computers, 33(2), 97-101.

EL Kinfe may be downloaded from: www.alma.edu/el

Questions and suggestions for improvement should be sent to: beagley@alma.edu