Labelling Art Collections Using Machine Learning
Developing a system that recommends the most relevant tags for artworks, successfully mimicking human labelling methods while offering additional tag suggestions through leveraging supervised and pre-trained models.
The Opportunity
Our client, a renowned art institution, was seeking to enhance the accuracy and relevance of their artwork tags, making it easier for users to navigate their vast collections. They needed a system that not only mimics the labelling traditionally done by human experts but also recommends additional relevant tags, potentially overlooked in the initial labelling process.
What we did
✔︎ Applied supervised and pre-trained models for tag predictions ✔︎ Developed a two-step ensemble model ✔︎ Created an interpretability system for predictions ✔︎ Built an API endpoint for tag recommendations ✔︎ Conducted a thorough data quality audit
The Results
The application of machine learning for art tagging has transformed the way users interact with our client's collection. The system has made it simpler for users to find relevant artworks while also uncovering overlooked connections within the collection.
101k
artworks were matched to 17.5k tags
How we did it
Our supervised and pre-trained models approach mimicked the way human labellers have historically assigned tags, while also recommending new tags based on their semantic similarity to the item's metadata. To deal with the scalability of the most granular tag level, we leveraged the MPNet model developed by Microsoft, known for its excellence in semantic search tasks. Further, we incorporated an ensemble model that combines the advantages of both approaches. To ensure transparency, we built an interpretability system that explains why specific tags are predicted, and an API endpoint that provides these recommendations. This, combined with regular data audits, helps maintain the system's effectiveness over time. Moreover, we suggested the following strategies for further development: ● Implementing data quality monitoring and improvement systems ● Crowdsourcing tag assignment to users ● Utilising existing data such as artwork images for better predictions ● Using an ensemble model that benefits from a variety of data sources. Our data science team worked closely with the client's in-house team to ensure the smooth integration of the machine learning model with their existing infrastructure. This collaboration ensured that the system is not a 'black-box', but a well-understood, interpretable component of their operations.
Start a conversation
Take the first step by speaking with one of our data experts today.