This year we heard a lot about ‘Cognitive Computing’ with all major technology companies announcing their cognitive strategies and next year it will get even more focus, but what is it all about ?
The word ‘cognitive’ relates to conscious mental activities, such as thinking, understanding, learning, and remembering.
In relation to computing, there is no universally agreed definition for Cognitive Computing, but at a very high level, think of things such as speech recognition, image recognition, text to speech and speech to text, and suddenly it starts to sound very familiar.
At a deeper level, it comprises of such things as Artificial Intelligence (AI), Machine Learning(ML), Human-Computer Interaction (HCI), Natural Language Processing(NLP) and a host of other things.
One of the reasons cognitive computing is so important, is it can humanise the interaction we have with technology – Siri telling you a joke, asking Cortana for directions, recognising the photos we have taken with pictures of the family in – the interaction between person and technology no longer needs to be via a keyboard and mouse !
This year Facebook, IBM, Google, Amazon, Microsoft (and others) have all made announcements, and have initiatives based around cognitive computing.
In October this year Microsoft researchers made a major breakthrough in speech recognition in that technology now recognises words in conversation as well as a human does. Link to Article
Microsoft Cognitive Services – Microsoft’s implementation is called ‘Microsoft Cognitive Services’ for simplicity it is five sets of Services :-
Vision, Speech, Language, Knowledge and Search.
Each of the Services is a cross platform (Windows, Web, iOS and Android) set of Application Program Interfaces (API’s) that can be used to build applications and services. They can be called (or used) from two or three lines of code.
Below is a brief explanation of each of the services
Vision Services – these are used to analyse static or moving images, face recognition has been around for a while (it’s probably built into the digital camera you are using), but these services can recognise age, gender, emotion (happy, angry, surprised etc.) and be trained to recognise specific people. In addition, it can recognise and describe objects.
Recently near real time analysis and descrtiption has been added – you can see it in action here Link
As an example, we all take far too many pictures with digital cameras, because it’s so easy right? As a result, we have hundreds (or thousands if you’re me) of unorganised pictures. Imagine indexing all those pictures and categorising them by the people in them, or the pictures where people are happy or the ones with my red car in. You will start to get an idea of the power of Vision Services.
Example of Vision Services in action.
Speech Services – speech recognition has been around for a while, with different degrees of success. If you have too much time on your hands, search YouTube for ‘speech recognition epic fail’ for the less successful times!
As mentioned before, this is a historic year, where technology has now matched the ability of a human to recognise the spoken word.
Part of Speech Services is what you would expect, text to speech, speech to text, additional services are the ability to train the service to understand who is speaking.
One of the new services is called the Custom Recognition Intelligent Service (or CRIS) in summary this service allows you to train and fine tune the speech service, to allow for accents, background noise and even vocabulary – for instance children use a different voice and vocabulary to adults.
The ability to ‘train’ a model to better recognise and understand the spoken word and context is key to improving the accuracy of recognition
As an example, you record the audio from a meeting and want to automatically make a transcription of the meeting, using this service you could do the transcription and identify in the transcript who was talking at the time.
You can try out the untrained speech recognition here Link to Speech Recognition
Language Services – as with Speech Services, some of the Language Services have been around for sometime, the Bing Spell Check Service, moves that on to the next level, the Service is constantly updated as new words, slang (such as ‘selfie’) etc. get added to the Bing vocabulary and is multi language aware.
In addition, there are Services like the Text Analytics Service, which can detect positive or negative sentiment, or detect the language used, or extract key words and phrases from a document so it can be summarised.
One of the key problems in human-computer interactions is the ability of the computer to understand what a person wants, and to find the pieces of information that are relevant to their intent
To help with this there is the Language Understanding Intelligent Service (LUIS). LUIS can be trained to understand the context around the question asked.
As an example, if you said ‘turn the lights off’ – the intent would be to ‘turn off’ and the entity would be the ‘lights’ – LUIS can be trained to understand the intent and entities in your statements – there is a great working example here Link
Knowledge Services – in some ways one of the harder sets of Services to explain!
Some of the services help explore relationships and entities among academic papers, journals, and authors. Easier to explain are recommendation services, an understanding of things frequently bought together and ‘if you liked this, you might like this’. Obviously, a key part of the way we buy online nowadays.
As an example, you are on an e-commerce web site buying widgets, and the service suggests that you might also like to buy a washer that is needed for that widget. Another example might be, by using a customer’s viewing history for movies, it’s possible to recommend additional movies and shows of interest.
Search Services – these Services enhance and supplement existing search services to help to find what you are looking for more quickly, such as type ahead prediction, refinement to image search categories.
In addition, there is Bing News Search, which will search news articles and give you summaries, images and related news based on your search criteria. Bing Video Search will do similar for video showing live previews, trending videos and other metadata.
As an example, you could do a video search for all Adele songs, and then filter by ‘free’ (rather than paid) and only added this week, the result will then show a thumb nail of the videos found and start playing when you hover over it.
If you have made it this far, thanks! You might also be thinking some of these things have been available for a while now, and you would be right.
The big deal about Cognitive Services, is now you can embed and use them in your own applications and web sites, taking features that previously were only available to the likes of Microsoft, Google, Amazon and with a few lines of code use them yourself.
Think of them as Lego blocks all laid out in front of you, the picture on the Lego box might suggest how you use them, but the possibilities of what you build and how you use them are endless, and likely you will come up with something nobody else has thought of!
For Further information on Cognitive Services see the following link https://www.microsoft.com/cognitive-services
Nigel Willson – You can follow me on Twitter or LinkedIn using these links
All views expressed are of those of me & not my employer . . .