Artificial Intelligence (AI) is Not an End-All to Metadata, it is Built From it
I have a lot of clients asking me about artificial intelligence. And a lot of programmer types telling me how librarians are going to be obsolete because "AI is coming." But I am not worried. Yes, AI is smart. Yes, it can tag an image of a boat, or a tree, or a dog and that is very cool. But what the folks selling you on the promise of AI aren't exactly highlighting is the amount of data that you're going to need to feed into your artificial / augmented intelligence or machine learning (ML) system for it to be able to automatically and consistently identify and tag your products, logos, lesser-known models, etc. And as quickly as product design, branding, and marketing campaigns change these days, you'll be moving onto the next massive batch of data to educate your system by the time it learns enough about your last product to really be "smart."
And here's another thing, even with all of the data you're going to feed your wonderful AI/ML to make it so smart about your content (many, many thousand images of your product from every possible angle), you are still going to need to tell it something about those images that you're feeding it. For example, even if you feed your smart machine enough images of your face cream product that it can begin to recognize it, you still need to tell it that it's a face cream, that it belongs to your Acme brand, that it is for evening application specifically and everything else that your smart machine needs to know about it in order to "automatically" apply any of that useful information.
All of this is to say that you need to provide the machine with the critical context about your content and data in order for it to become "smart" about it. And there is a lot you need to teach it. So beyond feeding it oodles and oodles of data to help it learn how to identify an image of your product, for example, you will also need to provide a lot of details about the product for it to be able tell you or your users anything of value about the image or video it will identify. You'll want to add technical, business, operational and usage metadata at a minimum, but there is always still administrative metadata such as rights and lifecycle management information to consider (although these are arguably less automate-able).
I will not be the first to tell you that even with ample artificial intelligence tagging your content, you will still need to rely on various aspects of a well-built metadata schema, taxonomy and information architecture to produce effective and useful results. I love Ramon Forster's explanation of how you will need to link any auto-tags back to your taxonomy in order to provide any contextual relationships, synonyms or other details that you may have captured in your schema already, but that the AI/ML system is not going to know on its own. He also spends a good deal of time explaining how a lot of tedious manual efforts can be avoided without buying into fancy machine learning tools, but simply by improving on your (existing?) best practices such as using naming conventions and allowing metadata to kick off workflows.
I think it is really important for people to understand the limits of technology even as it seems to be a magical revelation. These small boxes of light are capable of doing mind-boggling things, like alerting a diabetic if their blood glucose swings wildly, beating humans at Jeopardy, and even drawing some pretty psychedelic pictures. We're really just at the nascent stages of what we are doing with artificial intelligence, though. And as with all cutting edge tech, the big players are the ones making the strides that blow our minds, and they enable us with fantastic new tools, but what they're unable to provide for us is the context to our companies, our brands, and our people that only we know. There is still knowledge locked in the minds of our staff that we must release in a pretty manual way; With the application of context-specific metadata.
I didn't want to mention this because it is one of the most brow-beaten warnings against the dream of relying on AI, but I wouldn't want anyone to accuse me of not being thorough. In search analytics and search improvement circles, there is the idea of relevance vs. return. If someone performs a search on your content, you want to return enough results to them that within those results is the item of content that they're looking for. At the same time, you want the pool of results that you return to them to all be relevant enough to the search that they performed that the item of content that they are looking for doesn't get buried underneath a mountain of other marginally relevant content.
So how do you strike that balance? Well, that is where librarian sciences come in. Librarians are trained in the art of metadata. We are trained to apply metadata that will allow users to access the content from any avenue from which they might approach it (author? Subject? Date?) but also to refrain from adding "metadata clutter" that would cause the content to show up where it is not needed. And so, one of the biggest risks with allowing AI to tag your content (with the dream of course being that it does so without human intervention) is that it could end up tagging a lot of features of the content that really aren't salient. Does the tree in the background of the image add value to the context of what is in it? Does the fact that water is discussed in a story about young women's access to private bathrooms in certain countries mean that the article should be tagged with water? There is still, at least with the level of AI that we have available to us today, a very strong argument for human intervention, if not a lot direct human tagging, in the application of metadata to enterprise content.
I am not advocating to give up on the dream of AI or saying that you won't be able to get any useful tags out of an AI tool. But I am saying that you are only going to be able to get the most out of it if you have a metadata schema, taxonomy, and information architecture in place, and have applied that metadata to your data (your images, text, video, etc. content). If you have a solid foundation of metadata applied to content, only then will you have smart enough content to build a smart artificial intelligence classification capability. So, start from the start, or, start from where you are. If you have a very large database filled with all of your structured and unstructured content that is classified, tagged, and organized and clean enough to begin to feed your AI tool, then you are ready to pioneer the use of proprietary AI within your organization. If you do not have that, then you should probably look to begin by getting your metadata in order, and getting it applied to your content.
Windsor, Ralph. “What It Will Take for Artificial Intelligence to Become Useful For DAM.” CMSWire.com, CMSWire, 5 Dec. 2017, Accessed June 07, 2019, www.cmswire.com/digital-asset-management/what-it-will-take-for-artificial-intelligence-to-become-useful-for-dam/.
Walia, Amit. “Effective Artificial Intelligence Requires a Healthy Diet of Data.” CIO, IDG Communications, Inc., 10 July 2018, Accessed June 07, 2019, www.cio.com/article/3287151/effective-artificial-intelligence-requires-a-healthy-diet-of-data.html.
Forster, Ramon. “Artificial Intelligence in Content Management: Fact, Fiction and Future.” Picturepark.com, PicturePark, 18 July 2018, Accessed June 07, 2019, picturepark.com/content-management-blog/best-practices-for-dam/ai-auto-tagging-content-management-blog/.
Lacy, Ian. “FDA Approves Continuous Glucose Monitor with AI Assistant.” MDedge Endocrinology, Frontline Medical Communications Inc., 14 Mar. 2018, Accessed June 07, 2019, www.mdedge.com/endocrinology/article/160854/diabetes/fda-approves-continuous-glucose-monitor-ai-assistant.
MIT Press. “How Smart Machines Really Think.” Medium.com, MIT Press, 8 Nov. 2018, Accessed June 07, 2019, medium.com/@mitpress/how-smart-machines-really-think-a2b4a67db248.
Nelson, Paul. “What Does ‘Relevant’ Mean?” Search Technologies, Search Technologies, Accessed June 07, 2019, www.searchtechnologies.com/meaning-of-relevancy
Kilsby, Becky. “Five Steps to Making Change Happen This Autumn Using ‘Design Thinking’” Talented Ladies Club, 17 Sept. 2018, Accessed June 07, 2019, https://www.talentedladiesclub.com/articles/five-steps-to-making-change-happen-this-autumn-using-design-thinking/