Generalizations of Convolutional Neural Networks for 1D and 3D Data
While convolutional neural networks (ConvNets) are predominantly applied to 2D data such as images, their principles can also be extended to 1D and 3D data.
1D data
1D convolutions are particularly useful for processing time-series data, audio signals, and any form of sequential data. Here's how a 1D convolution operates:
- Consider an input signal with a shape of .
- When applying 16 filters of size with a stride , the output shape will be .
- If we apply 32 filters of size , again with , the output shape becomes .
A 1D filter slides along the signal, applying the same transformation at each position. The output is a vector that captures the transformed features. The size of the output can be determined using the formula:
This yields a vector because we're working with 1D data. In many cases, Recurrent Neural Networks (RNNs) are the go-to models for such data due to their ability to capture temporal dependencies. However, 1D convolutions can be more efficient and easier to train.
3D data
3D convolutions are essential for analyzing volumetric data.
3D data is available as medical images from CT scans or sequences of images over time, like video frames.
An instance of 3D convolution might involve:
- An input volume with shape .
- Applying 16 filters, each with dimensions and a stride , would result in an output volume of .
- Further applying 32 filters with the same dimensions results in a reduced volume of .
The same principle of calculating the output dimensions applies here as well:
This formula adjusts for 3D volumes by computing the size for each dimension, ensuring the output size is an integer, even after accounting for filter size, padding, and stride. The floor function ensures the result is an integer.