struct()
functionMATLAB deals with vectors and matrices naturally. Such data structures are ideal for many programs we'd wish to write. However, both assume that every piece of data we'd wish to use will have an identical type. As programs become more complex we may wish to associate different types of data with one another. Doing this with vectors alone is possible but clunky, so there exist tools specifically to handle these cases.
One place this is especially apparent is with text data. Suppose you wanted to store three names in an array. If you were to give the following command, you'd get an error:
names = [ 'Alice'; 'Bob'; 'Claire' ]
Dimensions of matrices being concatenated are not consistent.
What's happening? The syntax we used above seems like it should work
just fine: we want an array, where each element of the array is a different
name. That's not how MATLAB sees the command. Remember that the square bracket
notation with semicolons is trying to create a matrix with three rows, and
also remember how MATLAB stores text strings as sequences of ASCII
characters. That is, 'Alice'
is really the sequence:
>> uint8('alice') ans = 1×5 uint8 row vector 97 108 105 99 101
So really, the names
command above is trying to create a matrix where
the first row has five characters, the second row has three characters, and
the third row has six characters. Hence the explanation of the error message:
MATLAB requires our matrices to be rectangular, and it can't figure out what
to do when you try to build a matrix out of rows with three different lengths.
The bad solution to this problem is
to force all text to be the same length. We can do this by padding each name
with extra spaces. Note how the single quotes marking each string line up:
names = [ 'Alice '; 'Bob '; 'Claire' ]
This creates a matrix with three rows and six columns. This kind of works, but it's not great. Accessing a single name now requires accessing a row and all columns within that row:
>> names(2,:) ans = Bob
However, this really is just a matrix. For example, you are allowed to access columns of this matrix, which is valid syntax but its unlikely that this is something you'd ever like to do.
>> names(:,3) ans = i b a
The bigger problem is that all names must now be stored as the same length
strings. What happens if we want to add 'Alfredo'
to our
list of names (which has seven characters)?
More generally, we'd like to associate different data types with one another. For example, suppose we had several characters and we'd like to associate each with a numeric value. We might want to come up with a matrix such as:
M = [ 50 'A' ; 100 'B' ; 150 'C' ]
However, MATLAB doesn't know quite how to deal with this. It sees the presence of the characters 'A', 'B', and 'C' and automatically tries to interpret all of the data as characters. If you enter the above into your command window you'll get a nonsense result out the other side, which comes from interpreting 50 as ASCII character '2', 100 as character 'd', and so on.
M = 2A dB -C
The easiest solution might be to just declare two different vectors:
ints = [ 50; 100; 150 ] chars = [ 'A'; 'B'; 'C' ]
This is a reasonable workaround, but we will always have to remember to
access both vectors simultaneously for this to work. However, this is not
perfect- it makes some sensible operations impossible. Suppose we had
a function called shuffle
that shuffles an array:
shuffle = @(v) v(randperm(length(v)))
We can individually shuffle our two arrays, but it's not possible to shuffle all pairs of values that we're working with. If we try to shuffle both arrays individually we may get mismatched results:
>> shuffle( ints ) >> shuffle( chars ) ans = ans = 150 B 50 C 100 A
Cells are containers that may contain any arbitrary data. An array of cells will thus allow a programmer to have a single array-like stucture that can hold different types of data entirely. This is a great way to work with real-world data that doesn't fit into nice, rectangular matrices of numeric values.
Consider our text data problem above. What we really want is an array of names, without regard for how character data is converted to ASCII and stored in MATLAB. With a cell array this is easy. Note that we're using curly braces instead of the square braces that we tried above:
%Declare cell arrays using curly braces >>names = { 'Alice'; 'Bob'; 'Claire' } names = 3×1 cell array 'Alice' 'Bob' 'Claire'
That was easy! This declares a cell array with three rows and one column. Declaring matrices of cells works the way you would expect as well, with cells in the same row being separated by spaces (or commas) and cells in successive rows being separated by semicolons:
%Declare cell arrays using curly braces >>names = { 'Alice' 'David'; 'Bob' 'Erica'; 'Claire' 'Fred' } ans = 3×2 cell array 'Alice' 'David' 'Bob' 'Erica' 'Claire' 'Fred'
Observe carefully now that row and column indices now refer to cell position
and not to the underlying data. For example, if we ask for row-2 and column-1
we will get the whole cell contents 'Bob'
.
>> names(2,1) ans = cell 'Bob'
Furthermore, note that the type of data returned is not a character string, but a cell. The major downside of cell arrays is that suddenly everything is a cell, and working with cells requires a lot of explicit conversions between cells and the other data types (scalar numbers, text strings) that we're used to working with. We can't directly access the data inside cells. For example, if we wanted to access the second character of 'Bob', the obvious approach does not work:
>> cellVar = names(2,1) cellVar = cell 'Bob'
>> cellVar(2)
Index exceeds matrix dimensions.
What happened? The variable cellVar
is a 1x1 cell. There is no second
element to index. What we really want is the text string that's contained in the cell:
>> cellVar = names(2,1); >> strVar = char( cellVar ) %Convert cell to string strVar = Bob >> strVar(2) ans = o
Assigning to cell arrays has very similar problems. Suppose we want to change
'Fred'
to 'Francine'
. Simple assignment doesn't work:
names(3,2) = 'Francine'
Conversion to cell from char is not possible.
The solution is to explicitly convert the text string 'Francine'
to a cell before assignment:
>> names(3,2) = cellstr('Francine') names = 3×2 cell array 'Alice' 'David' 'Bob' 'Erica' 'Claire' 'Francine'
Most functions involving cell arrays are designed to convert to and from cells.
The functions iscell()
and iscellstr()
can be used to
figure out what kind of data you're working with.
num2cell()
- convert a numeric scalar or array into a cell array
cellstr()
- convert a text string into a cell string
cell2mat()
- convert a cell into a number
char()
- convert a cell string into a text string
iscell()
- determines whether a variable is a cell or not
iscellstr()
- determines whether a variable is a cell with a string or not
As mentioned above, a huge advantage to using cell arrays is the possiblity of having mixed data types in the same data structure. As before, declaring these kinds of aggregate data is easy:
>>data = { 50 'Alice'; 100 'Bob'; 150 'Claire' } data = 3×2 cell array [ 50] 'Alice' [100] 'Bob' [150] 'Claire'
This makes it easy to associate different types of data with one another, and to treat those data as a single unit:
%Exchange rows of data >> temp = data(2,:) temp = 1×2 cell array [100] 'Bob' >> data(2,:) = data(3,:); >> data(3,:) = temp; >> data data = 3×2 cell array [ 50] 'Alice' [150] 'Claire' [100] 'Bob'
However, one must be mindful of converting to and from the underlying data types:
>> data(2,1) = num2cell( 200 ); >> data(2,2) = cellstr( 'David' ) data = 3×2 cell array [ 50] 'Alice' [200] 'David' [100] 'Bob'
Structures, called structs, are an alternate method for handling aggregate data in MATLAB. While cell arrays tend to be somewhat MATLAB-specific in their semantics, almost all programming languages support something nearly or exactly similar to structs. Structs allow you to store and reference data in named fields.
The easiest way to create a one-off structure is with the struct()
function. This creates a structure with no fields, which can then be assigned
to with "dot notation".
>> person = struct(); >> person.name = 'Alice Anyperson'; >> person.age = 22; >> person.zip = 63103; >> person.edu = 'BS Computer Science'; >> person person = struct with fields: name: 'Alice Anyperson' age: 22 zip: 63103 edu: 'BS Computer Science'
Creating an array of structs is less intuitive than it could be. Extending our first struct into an array with an empty struct doesn't work exactly the way you'd hope (even though we could extend a scalar value into an array with a similar approach).
>> person(2) = struct()
Subscripted assignment between dissimilar structures.
This is because MATLAB only allows an array of structures with identical
fields. The first element of the array has four fields: name, age,
zip, and edu. However, the empty structure returned by struct()
has no fields. The way to create the subsequent
structures that we'd like is to simply access them with some field that exists
in the first struct.
>> person(2).name = 'Bob Programmer'; >> person(3).age = 21; >> person(4).edu = 'MS Mechanical Engineering' person = 1×4 struct array with fields: name age zip edu
Note that the remaining fields of structs 2, 3, and 4 will be empty until they are assigned some value:
>> person(3) ans = struct with fields: name: [] age: 21 zip: [] edu: []
At this point we can iterate over the array of structures as we would iterate over any array. (The below code will not work until the age field of each person is filled out.)
totalAge = 0; totalPeople = length( person ); for i = 1:length( person ) totalAge = totalAge + person(i).age; end averageAge = totalAge/totalPeople;
Structs may be easier to use in some situations as they are not tied to the cell data type. However, cell arrays are fundamentally arrays and as such may be easier in situations where vectorized operations or iteration can be used.