Cell Arrays and Structures


References


Limitations of Vectors

MATLAB deals with vectors and matrices naturally. Such data structures are ideal for many programs we'd wish to write. However, both assume that every piece of data we'd wish to use will have an identical type. As programs become more complex we may wish to associate different types of data with one another. Doing this with vectors alone is possible but clunky, so there exist tools specifically to handle these cases.

Text Data

One place this is especially apparent is with text data. Suppose you wanted to store three names in an array. If you were to give the following command, you'd get an error:

names = [ 'Alice'; 'Bob'; 'Claire' ]
Dimensions of matrices being concatenated are not consistent.

What's happening? The syntax we used above seems like it should work just fine: we want an array, where each element of the array is a different name. That's not how MATLAB sees the command. Remember that the square bracket notation with semicolons is trying to create a matrix with three rows, and also remember how MATLAB stores text strings as sequences of ASCII characters. That is, 'Alice' is really the sequence:

>> uint8('alice')

ans =

  1×5 uint8 row vector

    97   108   105    99   101

So really, the names command above is trying to create a matrix where the first row has five characters, the second row has three characters, and the third row has six characters. Hence the explanation of the error message: MATLAB requires our matrices to be rectangular, and it can't figure out what to do when you try to build a matrix out of rows with three different lengths. The bad solution to this problem is to force all text to be the same length. We can do this by padding each name with extra spaces. Note how the single quotes marking each string line up:

names = [ 'Alice '; 
          'Bob   '; 
          'Claire' ]

This creates a matrix with three rows and six columns. This kind of works, but it's not great. Accessing a single name now requires accessing a row and all columns within that row:

>> names(2,:)

ans =

Bob

However, this really is just a matrix. For example, you are allowed to access columns of this matrix, which is valid syntax but its unlikely that this is something you'd ever like to do.

>> names(:,3)

ans =

i
b
a

The bigger problem is that all names must now be stored as the same length strings. What happens if we want to add 'Alfredo' to our list of names (which has seven characters)?

Aggregate Data

More generally, we'd like to associate different data types with one another. For example, suppose we had several characters and we'd like to associate each with a numeric value. We might want to come up with a matrix such as:


M = [ 50  'A' ;
      100 'B' ;
      150 'C' ]

However, MATLAB doesn't know quite how to deal with this. It sees the presence of the characters 'A', 'B', and 'C' and automatically tries to interpret all of the data as characters. If you enter the above into your command window you'll get a nonsense result out the other side, which comes from interpreting 50 as ASCII character '2', 100 as character 'd', and so on.

M =

   2A
   dB
   -C

The easiest solution might be to just declare two different vectors:

ints  = [ 50;  100; 150 ]
chars = [ 'A'; 'B'; 'C' ] 

This is a reasonable workaround, but we will always have to remember to access both vectors simultaneously for this to work. However, this is not perfect- it makes some sensible operations impossible. Suppose we had a function called shuffle that shuffles an array:

shuffle = @(v) v(randperm(length(v)))

We can individually shuffle our two arrays, but it's not possible to shuffle all pairs of values that we're working with. If we try to shuffle both arrays individually we may get mismatched results:

>> shuffle( ints )            >> shuffle( chars )

ans =                         ans =

     150                           B
      50                           C
     100                           A

Cell Arrays

Cells are containers that may contain any arbitrary data. An array of cells will thus allow a programmer to have a single array-like stucture that can hold different types of data entirely. This is a great way to work with real-world data that doesn't fit into nice, rectangular matrices of numeric values.

Consider our text data problem above. What we really want is an array of names, without regard for how character data is converted to ASCII and stored in MATLAB. With a cell array this is easy. Note that we're using curly braces instead of the square braces that we tried above:

%Declare cell arrays using curly braces
>>names = { 'Alice'; 'Bob'; 'Claire' }

names =

  3×1 cell array

    'Alice'
    'Bob'
    'Claire'

That was easy! This declares a cell array with three rows and one column. Declaring matrices of cells works the way you would expect as well, with cells in the same row being separated by spaces (or commas) and cells in successive rows being separated by semicolons:

%Declare cell arrays using curly braces
>>names = { 'Alice'  'David';
            'Bob'    'Erica';
            'Claire' 'Fred' }

ans =

  3×2 cell array

    'Alice'     'David'
    'Bob'       'Erica'
    'Claire'    'Fred' 

Observe carefully now that row and column indices now refer to cell position and not to the underlying data. For example, if we ask for row-2 and column-1 we will get the whole cell contents 'Bob'.

>> names(2,1)

ans =

  cell

    'Bob'

Furthermore, note that the type of data returned is not a character string, but a cell. The major downside of cell arrays is that suddenly everything is a cell, and working with cells requires a lot of explicit conversions between cells and the other data types (scalar numbers, text strings) that we're used to working with. We can't directly access the data inside cells. For example, if we wanted to access the second character of 'Bob', the obvious approach does not work:

>> cellVar = names(2,1)

cellVar =

  cell

    'Bob'

>> cellVar(2)
Index exceeds matrix dimensions.

What happened? The variable cellVar is a 1x1 cell. There is no second element to index. What we really want is the text string that's contained in the cell:

>> cellVar = names(2,1);
>> strVar = char( cellVar ) %Convert cell to string

strVar =

Bob

>> strVar(2)

ans =

o

Assigning to cell arrays has very similar problems. Suppose we want to change 'Fred' to 'Francine'. Simple assignment doesn't work:

names(3,2) = 'Francine'
Conversion to cell from char is not possible.

The solution is to explicitly convert the text string 'Francine' to a cell before assignment:

>> names(3,2) = cellstr('Francine')

names =

  3×2 cell array

    'Alice'     'David'   
    'Bob'       'Erica'   
    'Claire'    'Francine'

Cell Array Functions

Most functions involving cell arrays are designed to convert to and from cells. The functions iscell() and iscellstr() can be used to figure out what kind of data you're working with.

Aggregate Data with Cell Arrays

As mentioned above, a huge advantage to using cell arrays is the possiblity of having mixed data types in the same data structure. As before, declaring these kinds of aggregate data is easy:

>>data = { 50  'Alice';
         100 'Bob';
         150 'Claire' }

data =

  3×2 cell array

    [ 50]    'Alice' 
    [100]    'Bob'   
    [150]    'Claire' 

This makes it easy to associate different types of data with one another, and to treat those data as a single unit:

%Exchange rows of data
>> temp = data(2,:)

temp =

  1×2 cell array

    [100]    'Bob'

>> data(2,:) = data(3,:);
>> data(3,:) = temp;
>> data

data =

  3×2 cell array

    [ 50]    'Alice'
    [150]    'Claire'
    [100]    'Bob'

However, one must be mindful of converting to and from the underlying data types:

>> data(2,1) = num2cell( 200 );
>> data(2,2) = cellstr( 'David' )

data =

  3×2 cell array

    [ 50]    'Alice' 
    [200]    'David' 
    [100]    'Bob'

Structures

Structures, called structs, are an alternate method for handling aggregate data in MATLAB. While cell arrays tend to be somewhat MATLAB-specific in their semantics, almost all programming languages support something nearly or exactly similar to structs. Structs allow you to store and reference data in named fields.

The easiest way to create a one-off structure is with the struct() function. This creates a structure with no fields, which can then be assigned to with "dot notation".

>> person = struct();
>> person.name = 'Alice Anyperson';
>> person.age  = 22;
>> person.zip  = 63103;
>> person.edu  = 'BS Computer Science';
>> person

person = 

  struct with fields:

    name: 'Alice Anyperson'
     age: 22
     zip: 63103
     edu: 'BS Computer Science'

Creating an array of structs is less intuitive than it could be. Extending our first struct into an array with an empty struct doesn't work exactly the way you'd hope (even though we could extend a scalar value into an array with a similar approach).

>> person(2) = struct()
Subscripted assignment between dissimilar structures.

This is because MATLAB only allows an array of structures with identical fields. The first element of the array has four fields: name, age, zip, and edu. However, the empty structure returned by struct() has no fields. The way to create the subsequent structures that we'd like is to simply access them with some field that exists in the first struct.

>> person(2).name = 'Bob Programmer';
>> person(3).age = 21;
>> person(4).edu = 'MS Mechanical Engineering'

person = 

  1×4 struct array with fields:

    name
    age
    zip
    edu

Note that the remaining fields of structs 2, 3, and 4 will be empty until they are assigned some value:

>> person(3)

ans = 

  struct with fields:

    name: []
     age: 21
     zip: []
     edu: []

At this point we can iterate over the array of structures as we would iterate over any array. (The below code will not work until the age field of each person is filled out.)

totalAge = 0;
totalPeople = length( person );

for i = 1:length( person )
    totalAge = totalAge + person(i).age;
end

averageAge = totalAge/totalPeople;

Structs may be easier to use in some situations as they are not tied to the cell data type. However, cell arrays are fundamentally arrays and as such may be easier in situations where vectorized operations or iteration can be used.