Generative AI models are trained on vast amounts of data, be this from books, articles, journals, images, music or other forms of media. This data is used to help the models learn patterns, structures and relationships of artefacts.
For this example, I will be discussing how an AI model can be trained to successfully identify images of cats and dogs.
Since this AI model is being trained to identify images of cats and dogs, the training data will consist of images of dogs and cats, with each dog labelled as a ‘dog’, and each cat as ‘cat'. We will need an impressive amount of images of various breeds of cats and dogs for our model to work, for this we could use a public dataset or a private one of our own creation.
To begin, I will download the dataset and pre-process the images by resizing them to a consistent size, ensuring they all have the same dimensions for the model to process effectively. Additionally, we may apply some image augmentation techniques, like flipping or rotating the images, to artificially increase the dataset's size and improve the model's generalisation.
Next, we will define our deep learning model and its architecture. We will use a Convolutional Neural Network (CNN), a specialised type of neural network that is particularly effective for image recognition tasks. A CNN will be able to automatically learn spatial hierarchies of features, such as edges, shapes, and textures, making it ideal for identifying cats and dogs in various image conditions.
Once the model architecture is defined, we will train the model on the preprocessed images, using a training set and a validation set to monitor its performance. The model will learn to distinguish between images of cats and dogs by adjusting its internal parameters based on the error (or loss) between the predicted and actual labels.
After training is complete, we will evaluate the model on a separate test set to ensure its ability to generalise to new, unseen images. We can make edits as needed to the model, to ensure that it is consistently improving its outcomes, this is called tuning.
If the model performs well, and to a high success rate at defining what the current datasets images, we can then deploy it to classify new images of cats and dogs in real-time applications.
Bayliss, E (2025) Dog VS Cat series [Digital Artworks]