Labelling Art Collections Using Machine Learning

Developing a system that recommends the most relevant tags for artworks, successfully mimicking human labelling methods while offering additional tag suggestions through leveraging supervised and pre-trained models.

Labelling Art Collections Using Machine Learning

The Opportunity

Our client, a renowned art institution, was seeking to enhance the accuracy and relevance of their artwork tags, making it easier for users to navigate their vast collections. They needed a system that not only mimics the labelling traditionally done by human experts but also recommends additional relevant tags, potentially overlooked in the initial labelling process.

What we did

✔︎ Applied supervised and pre-trained models for tag predictions
✔︎ Developed a two-step ensemble model
✔︎ Created an interpretability system for predictions
✔︎ Built an API endpoint for tag recommendations
✔︎ Conducted a thorough data quality audit

The Results

The application of machine learning for art tagging has transformed the way users interact with our client's collection. The system has made it simpler for users to find relevant artworks while also uncovering overlooked connections within the collection.

101k

artworks were matched to 17.5k tags

How we did it

Our supervised and pre-trained models approach mimicked the way human labellers have historically assigned tags, while also recommending new tags based on their semantic similarity to the item's metadata.

To deal with the scalability of the most granular tag level, we leveraged the MPNet model developed by Microsoft, known for its excellence in semantic search tasks. Further, we incorporated an ensemble model that combines the advantages of both approaches.

To ensure transparency, we built an interpretability system that explains why specific tags are predicted, and an API endpoint that provides these recommendations. This, combined with regular data audits, helps maintain the system's effectiveness over time.

Moreover, we suggested the following strategies for further development:
● Implementing data quality monitoring and improvement systems
● Crowdsourcing tag assignment to users
● Utilising existing data such as artwork images for better predictions
● Using an ensemble model that benefits from a variety of data sources.

Our data science team worked closely with the client's in-house team to ensure the smooth integration of the machine learning model with their existing infrastructure. This collaboration ensured that the system is not a 'black-box', but a well-understood, interpretable component of their operations.

Contact

Start a conversation

Take the first step by speaking with one of our data experts today.

A Knowledge Hub for Brand Protection
AI Agent for Customer Service