Working with FlodymArrays

Initializing arrays

FlodymArray objects require a DimensionSet at initialization. Optionally, a name can be given. If the values are not given, the array is initialized with zeros.

There are several subclasses of FlodymArray, often with little or no changes in functionality: See the API reference of Flow, Parameter, and StockArray.

Flow object have to be passed the two Process objects they connect at initialization.

In this HOWTO, only the FlodymArray base class is used.

Further options to initialize arrays are discussed in the HOWTO on data input.

[18]:

import numpy as np
from flodym import Dimension, DimensionSet, FlodymArray

# Create a dimension set
dims = DimensionSet(
    dim_list=[
        Dimension(name="Region", letter="r", items=["EU", "US", "MEX"]),
        Dimension(name="Product", letter="p", items=["A", "B"]),
        Dimension(name="Time", letter="t", items=[2020]),
    ]
)

flow_a = FlodymArray(dims=dims["t", "p"], values=0.2 * np.ones((1, 2)))
flow_b = FlodymArray(dims=dims["r", "t"], values=0.1 * np.ones((3, 1)))
parameter_a = FlodymArray(dims=dims["r", "p"], values=0.5 * np.ones((3, 2)))

Math operations

FlodymArrays have the basic mathematical operations implemented. Let#s first create two arrays:

We write a small routine to print properties of the resulting array, and test it on the inputs:

[2]:

def show_array(arr: FlodymArray):
    print(f"  dimensions: {arr.dims.letters}")
    print(f"  shape: {arr.dims.shape()}")
    print(f"  name: {arr.name}")
    print(f"  values mean: {np.mean(arr.values):.3f}")
    print(f"  values sum: {arr.values.sum():.3f}")


print("flow_a:")
show_array(flow_a)
print("flow_b:")
show_array(flow_b)

flow_a:
  dimensions: ('t', 'p')
  shape: (1, 2)
  name: unnamed
  values mean: 0.200
  values sum: 0.400
flow_b:
  dimensions: ('r', 't')
  shape: (3, 1)
  name: unnamed
  values mean: 0.100
  values sum: 0.300

Now let’s try some operations.

[3]:

summed = flow_a + flow_b
print("flow_a:")
show_array(summed)

flow_a:
  dimensions: ('t',)
  shape: (1,)
  name: unnamed
  values mean: 0.700
  values sum: 0.700

What happened here? When adding the two flows, all dimensions that could be preserved were preserved. These are the dimensions that occur in both flow_a and flow_b, in this case only time.

Since we wouldn’t know how to split flow_a by region and flow_b by product, we have to sum the arrays to the set intersection of both arrays, and then perform the addition.

The same goes for subtraction:

[4]:

difference = flow_a - flow_b
print("difference:")
show_array(difference)

difference:
  dimensions: ('t',)
  shape: (1,)
  name: unnamed
  values mean: 0.100
  values sum: 0.100

For multiplication and division, things are different. If we multiply a flow with a parameter, which splits it along a new dimension, the resulting flow can have that new dimension. Therefore, in multiplication and division, we keep all the dimensions that appear in either of the flows, i.e. the set union.

[5]:

# recall:
print("flow_a dimensions: ", flow_a.dims.letters)
print("parameter_a dimensions: ", parameter_a.dims.letters, "\n")

multiplied = flow_a * parameter_a
print("multiplied:")
show_array(multiplied)

divided = flow_a / parameter_a
print("divided:")
show_array(divided)

flow_a dimensions:  ('t', 'p')
parameter_a dimensions:  ('r', 'p')

multiplied:
  dimensions: ('t', 'p', 'r')
  shape: (1, 2, 3)
  name: unnamed
  values mean: 0.100
  values sum: 0.600
divided:
  dimensions: ('t', 'p', 'r')
  shape: (1, 2, 3)
  name: unnamed
  values mean: 0.400
  values sum: 2.400

This may not be the dimension we want, for example we might want to sum the result over products, keeping the dimensions tie and region. There are some class methods for these kinds of operations. See the API reference for the full documentation. For our example:

[6]:

reduced = multiplied.sum_to(result_dims=("t", "r"))
print("reduced:")
show_array(reduced)

reduced:
  dimensions: ('t', 'r')
  shape: (1, 3)
  name: unnamed
  values mean: 0.200
  values sum: 0.600

With scalars

Math operations can also be performed between a FlodymArray and a scalar. The scalar is then expanded into the shape of the array before the operation is performed:

[7]:

sum_with_scalar = flow_a + 0.4

print("SUm with scalar:")
show_array(sum_with_scalar)

SUm with scalar:
  dimensions: ('t', 'p')
  shape: (1, 2)
  name: unnamed
  values mean: 0.600
  values sum: 1.200

Using just the `values` array

When a mathematical operation is not implemented, you can still work with the values array manually, which is a numpy array. We recommend using either the numpy ellipsis slice [...] or the FlodymArray.set_values() method, which both ensure keeping the correct shape of the array.

[8]:

flow_a.values[...] = 0.3
print("flow_a:")
show_array(flow_a)

flow_a.set_values(flow_a.values**2)
print("flow_a:")
show_array(flow_a)

flow_a:
  dimensions: ('t', 'p')
  shape: (1, 2)
  name: unnamed
  values mean: 0.300
  values sum: 0.600
flow_a:
  dimensions: ('t', 'p')
  shape: (1, 2)
  name: unnamed
  values mean: 0.090
  values sum: 0.180

Computing values of existing arrays, such as flows

In a flodym MFASystem, you have defined at initialization which arrays have which dimensionality. You can use that information to conveniently sum the result of an operation to the shape you defined, potentially re-ordering dimensions.

This is done using the so-called ellipsis slice [...]:

[9]:

# define and initialize values with zero
predefined_flow = FlodymArray(name="predefined", dims=dims["r", "p"])
print("predefined_flow:")
show_array(predefined_flow)

# recall:
multiplied = flow_a * parameter_a
print("multiplied:")
show_array(multiplied)

# set values of predefined_flow to the values of multiplied
predefined_flow[...] = flow_a * parameter_a

print("predefined_flow:")
show_array(predefined_flow)

predefined_flow:
  dimensions: ('r', 'p')
  shape: (3, 2)
  name: predefined
  values mean: 0.000
  values sum: 0.000
multiplied:
  dimensions: ('t', 'p', 'r')
  shape: (1, 2, 3)
  name: unnamed
  values mean: 0.045
  values sum: 0.270
predefined_flow:
  dimensions: ('r', 'p')
  shape: (3, 2)
  name: predefined
  values mean: 0.045
  values sum: 0.270

In a flodym MFASystem, this is a bit tricky, but quite important, as the flows are stored as a dictionary. (For simplicity, we only re-create these dictionaries, not the whole MFASystem)

[10]:

flows = {
    "flow_a": flow_a,
    "flow_b": flow_b,
    "predefined_flow": predefined_flow,
}
parameters = {
    "parameter_a": parameter_a,
}

The correct way to perform an operation here, is using the ellipsis slice on the left side of an assignment, as this only affects the values of the FlodymArray object:

[11]:

flows["predefined_flow"][...] = flows["flow_a"] * parameters["parameter_a"]
print("predefined_flow:")
show_array(flows["predefined_flow"])

predefined_flow:
  dimensions: ('r', 'p')
  shape: (3, 2)
  name: predefined
  values mean: 0.045
  values sum: 0.270

While the following wrong code without the ellipsis slice will overwrite the whole object, with uncontrolled outcome:

[12]:

flows["predefined_flow"] = flows["flow_a"] * parameters["parameter_a"]
print("WRONG predefined_flow:")
show_array(flows["predefined_flow"])

WRONG predefined_flow:
  dimensions: ('t', 'p', 'r')
  shape: (1, 2, 3)
  name: unnamed
  values mean: 0.045
  values sum: 0.270

Slicing

Sometimes, we don’t want to access the whole array, but just a slice. We can do this with indexing.

We can use indexing on the right-hand side of an assignment to only calculate with part of an array, and on the left-hand side, to only set the values of part of an array.

Let’s look at “getting” a slice first:

[13]:

# recall
print("flow_a dimensions: ", flow_a.dims.letters)

slice_a1 = flow_a["A"]
print("slice_a1:")
show_array(slice_a1)

flow_a dimensions:  ('t', 'p')
slice_a1:
  dimensions: ('t',)
  shape: (1,)
  name: unnamed
  values mean: 0.090
  values sum: 0.090

You can also slice along several dimensions at the same time. If you like to be more specific, you can also give the slice indexes as a dictionary. This is actually necessary if an item appears in several dimensions, such that giving only the item would be ambiguous.

[14]:

slice_a2 = flow_a["A", 2020]
print("slice_a2:")
show_array(slice_a2)

slice_a3 = flow_a[{"t": 2020}]
print("slice_a3:")
show_array(slice_a3)

slice_a4 = flow_a[{"t": 2020, "p": "A"}]
print("slice_a4:")
show_array(slice_a4)

slice_a2:
  dimensions: ()
  shape: ()
  name: unnamed
  values mean: 0.090
  values sum: 0.090
slice_a3:
  dimensions: ('p',)
  shape: (2,)
  name: unnamed
  values mean: 0.090
  values sum: 0.180
slice_a4:
  dimensions: ()
  shape: ()
  name: unnamed
  values mean: 0.090
  values sum: 0.090

As you can see, zero-dimensional FlodymArrays are possible.

Note that numpy indexing of the whole object like flow_a[0, :] is not supported, as flodym wouldn’t know if in flow_a[2020], 2020 is an index or an item of the dimension.

Of course, you can slice the values array: flow_a.values[:,0]. But we recommend not to do it. One major design goal of flodym is too keep the code flexible to changes in the dimensions, and flow_a.values[:,0] is quite inflexible with respect to the order and number of dimensions in the array, and to the order and number of items in the dimensions.

The slices we looked at just take one item along a dimension and drop that dimension in the process. If we want to access several items along one dimension, that creates a problem, as the dimension can’t be dropped, but is changed, as it does not contain all items of the original one anymore. To cope with that, we have to create a new dimension object with a new name and letter, and pass it to the slice, along with the dimension letter we’re taking a subset of:

[15]:

regions_na = Dimension(name="RegionsNA", letter="n", items=["US", "MEX"])

slice_b1 = flow_b[{"r": regions_na}]
print("slice_a5:")
show_array(slice_b1)

slice_a5:
  dimensions: ('n', 't')
  shape: (2, 1)
  name: unnamed
  values mean: 0.100
  values sum: 0.200

As mentioned earlier, You can also use slicing to only access a par of the array on the left-hand side of an assignment:

[16]:

flow_b["EU"] = flow_a["A"]
print("flow_b.values:\n", flow_b.values)

flow_b.values:
 [[0.09]
 [0.1 ]
 [0.1 ]]

On the left-hand side, it is also possible to access several items along one dimension, with the same syntax. It does not change the shape of the flow.

[17]:

flow_b[{"r": regions_na}] = flow_b[{"r": regions_na}] * 3
print("flow_b.values:\n", flow_b.values)
print("flow_b:")
show_array(flow_b)

flow_b.values:
 [[0.09]
 [0.3 ]
 [0.3 ]]
flow_b:
  dimensions: ('r', 't')
  shape: (3, 1)
  name: unnamed
  values mean: 0.230
  values sum: 0.690

Operation rules summary

Let’s summarize here the rules for dimension handling:

Additions and subtractions yield the set intersection of the two participating arrays.
Multiplications and divisions yield the set union of the participating arrays.
When setting the values of an existing array, the array on the right-hand side of the assignment is summed down to the dimensions of the left-hand side. Missing dimensions on the right-hand side will lead to an error
Scalars are converted to an array of equal dimensions before the operation is performed.

Caveat

We found these rules to yield the right behavior in almost all cases.

There are exceptions: When adding two dimensionless parameters with different dimensions, it may be intended that the dimensions of both inputs are still used.

A flodym extension is planned to account for this. In the meantime, we advise to use the FlodymArray.cast_to() method on the arrays before performing the operation.