Generalizations of Convolutional Neural Networks for 1D and 3D Data

While convolutional neural networks (ConvNets) are predominantly applied to 2D data such as images, their principles can also be extended to 1D and 3D data.

1D data

1D convolutions are particularly useful for processing time-series data, audio signals, and any form of sequential data. Here's how a 1D convolution operates:

Consider an input signal with a shape of $(14, 1)$ .
When applying 16 filters of size $f = 5$ with a stride $s = 1$ , the output shape will be $10 \times 16$ .
If we apply 32 filters of size $f = 5$ , again with $s = 1$ , the output shape becomes $6 \times 32$ .

A 1D filter slides along the signal, applying the same transformation at each position. The output is a vector that captures the transformed features. The size of the output can be determined using the formula:

\frac{n - f}{s} + 1

2D Convolution

This yields a vector because we're working with 1D data. In many cases, Recurrent Neural Networks (RNNs) are the go-to models for such data due to their ability to capture temporal dependencies. However, 1D convolutions can be more efficient and easier to train.

3D data

3D convolutions are essential for analyzing volumetric data.

3D Convolution

3D data is available as medical images from CT scans or sequences of images over time, like video frames.

Brain Scan

An instance of 3D convolution might involve:

An input volume with shape $(14, 14, 14, 1)$ .
Applying 16 filters, each with dimensions $f = 5$ and a stride $s = 1$ , would result in an output volume of $(10, 10, 10, 16)$ .
Further applying 32 filters with the same dimensions results in a reduced volume of $(6, 6, 6, 32)$ .

The same principle of calculating the output dimensions applies here as well:

\left\lfloor \frac{n^{[l-1]} - f + 2 \times p}{s} \right\rfloor + 1 = n^{[l]}

This formula adjusts for 3D volumes by computing the size for each dimension, ensuring the output size is an integer, even after accounting for filter size, padding, and stride. The floor function $\lfloor \cdot \rfloor$ ensures the result is an integer.

Neural Style Transfer Introduction