AI technologies are set to impact the media sector at all stages, from media production to delivery. Part of this revolution has already reached our homes: voice control with Apple TV, creative photo editing with Prisma, Snapchat lenses.
Perhaps the most successful to date has been Netflix's use of big data and detailed analytics, giving the SVOD service “a significant advantage when it comes to the ability to accurately target viewers with highly targeted shows,” says Futuresource analyst David Sidebottom.
Increasingly, the interface for content discovery and wider Internet of Things in the home will be voice, via virtual assistants like Amazon Alexa in which voice biometrics and improved contextual understanding will become points of differentiation for AI platforms.
Here are nine other examples of AI’s development.
Automated lip reading
Consuming and creating visual content online poses challenges for people who are blind or severely visually impaired.
Facebook’s Automatic alt (alternative) text generates an audio description of a still photo (not yet video) using object recognition technology based on a neural network that, according to Facebook
“has billions of parameters and is trained with millions of examples."
Facebook launched it on iOS screen readers for the English language and plan to add the functionality for other languages and platforms.
The hearing-impaired can benefit from automatic subtitles. Researchers at Oxford University’s Department of Computer Science developed AI system LipNet capable of discerning speech from silent video clips and scoring a 93.4% success rate versus 52.3% from professional lip-readers.
According to MIT commentator James Condliffe, LipNet
analyses the whole sentence rather than individual words, enabling it to gain an understanding of context (there are fewer mouth shapes than there are sounds produced by the human voice).
Another Oxford Uni study,
reported in New Scientist, trained Google’s Deepmind AI on a series of video clips featuring a broader range of language, and greater variation in lighting and head positions. The AI was able to identify 46.8% of words correctly, trumping humans who managed just 12.4%.
As Condliffe points out it’s not hard to imagine potential applications for such software. In the future Skype could fill in the gaps when a caller is in a noisy environment, say, or people with hearing difficulties could hold their smartphone up to ‘hear’ what someone is saying.
Automatic subtitles – cost saving
BBC R&D has road-tested technology that aims to automatically recover subtitles for video clips taken from video. Around 850 of the 1000 clips used to create an app celebrating Sir David Attenborough’s ‘Story of Life’ feature subtitles recovered from the BBC archive without human intervention.
As the technology’s developer Mike Armstrong points out, subtitling more than 1,000 video clips from scratch would have been challenging for a team working on a tight budget.
Audio was extracted from each clip and passed through a speech to text engine to create a transcript. This was turned into a series of search strings that were combined with the clip metadata to locate a subtitle file for the TV programme which best matched the clip. Further processing then worked out where the clip came from in the programme and extracted the relevant subtitles.
Around 200 clips needed manual editing due to inconsistencies in the data, caused in some instances by the algorithm failing to match UK programme versions with international ones.
“The challenge of recovering subtitle files has been valuable in proving the effectiveness of the technique and provided it with its first public exposure,” says Armstrong.
A number of recent developments in Machine Learning research will allow picture and movie content editing with Photoshop-like tools that edit conceptual elements of an image instead of individual pixels. It will soon be possible to directly edit facial expressions, and facial features. Twitter account @smilevector may not be the most technically advanced example but is a demonstration of the options. The Neural Photo Editing tool is more advanced.
It will also be possible to remove unwanted objects with a technique called in-painting, using a simple point-and-click interface. There’s consumer software for this already, and an academic paper on the subject from researchers at Stanford.
As digital video recorders and on-demand video proliferate, advertisers face challenges from viewers who skip over their commercials or who ignore traditional online ads. Product placement might hold the answer – but not in its traditional form which is complicated and time-consuming (and for which an advertiser might need to commit a year in advance). It’s also a gamble – what if the spot ends up on the cutting room floor, or what if it ends up being not to an advertiser’s liking?
London-based MirriAd deploys technology which places brands digitally into an event in real-time and using demographic data to target. The technology includes a planar tracker able to recognise the lighting characteristics of specific zones in a video and embeds an object (drinks can in fridge, for example) onto the image taking into account the lighting. Samsung used the technology to advertise its home appliances within fifty episodes of dramas streamed on China OTT service Youku.
Traditional cybersecurity approaches are typically reactive: as a new threat is discovered, new rules and countermeasures are added to the set of techniques available to the cybersecurity software.
“As attacks grow in complexity and scale (sometimes involving millions of compromised machines), AI is being deployed to discover new attacks without human supervision by identifying anomalies in the distributed traffic patterns (in terms of content, frequency, and synchronicity of traffic),” explains Pietro Berkes, Principal Data Scientist, Nagra Insight, Kudelski Group. “In contrast to the traditional methods, this AI approach allows reacting to attacks that had not been previously encountered.”
AI is also used to address fraud related to complex data, like video. In order to identify pirated video, algorithms need to match it to a legitimate content, even if the video has been distorted (e.g. cropped, reshaped, or the coloured have been altered). NAGRA recently announced the launch of its media services offering that already makes use of video recognition technology to identify illegal streams.
Understanding subscriber behaviour is critical for many key activities: retaining customers, planning marketing campaigns and promotions, negotiating licensing rights and so on. In this context, according to Berkes, traditional business intelligence approaches quickly reach their limits, as the outcome of these activities does not depend on any one factor.
Take the fight against churn as an example. A predictive AI algorithm can take into account hundreds of features in order to compute the probability of churning: viewing patterns, purchase frequency, demographical data, devices used, geographical area.
“AI can even help understanding why a user is churning, and suggest the most effective action to take in order to avoid churn, based on past experience with similar subscribers,” says Berkes.
Customer relationship management is key to industry sales and marketing and a logical extension of the big data being hoovered by organisations about customers is to have it processed by machine. Oracle and Microsoft are developing AI assisted CRM software but it is Salesforce which has the lead. Last September it announced Einstein – an AI that learns from CRM data, email, calendar, social, ERP, and IoT and delivers predictions and recommendations in context of what the business trying to do.
It got there with a team of 175 data scientists and a string of AI technology acquisitions including MetaMind. In sales, for example, expect Einstein features that provide insights into sales opportunities, offer predictive lead scoring, recommend connections and next steps, and automate simple tasks such as logging phone calls and other interactions with customers.
Techniques to beat traditional encoding and decoding will permit the transmission of high-quality media content even in regions with low internet and mobile bandwidth. ANNs are being used not only to build better compression methods but also to artificially clean up and increase the resolution of transmitted images (known as ‘super-resolution.’)
To become the number one provider of live video, Twitter is facing a non-trivial issue with content distribution: a large part of users connect from a mobile device, possibly with a low-bandwidth mobile connection (even more so in markets outside Europe and the US).
Twitter acquired Magic Pony Technologies last June 2016 for $150m to develop ways to reconstruct HD video from a Low-Definition, compressed stream. Super-resolution enables Twitter to transmit content in low definition (thus consuming less bandwidth) and using standard encoders and decoders. The ANNs clean up the compression artefact and upsample images to higher resolution without the result looking pixelated.
“The ANNs are trained with several hours of HD video and learn what typical images look like; more technically, they build a complex model of the statistics of natural images,” explains Nagra’s Pietro Berkes. “Given a corrupted, low-resolution image, they are able to remove all the defects that make it ‘atypical’ by comparing it to this statistical model.”
Twitter is also working to develop better image encoding methods to outperform common JPEG standards, as is Google.
“A significant advantage of AI over traditional codecs is that they can be focussed on a particular subset of content, like nature documentaries, and trained to apply a compression model that is specific to this content, potentially saving more bandwidth,” says Berkes.
When systems go wrong
Microsoft trained its chatbot Tay on Twitter last year with disastrous results when it began spewing racist and sexist tweets proving that AI systems are only as smart and benevolent as the training data you use to teach them. Plus it could be a customer relations nightmare.
It’s why a group of luminaries including entrepreneur Elon Musk, and Facebook’s AI chief Yann LeCun were among 2,000 signatories to a set of guidelines published earlier this year – The 23Asimolar AI Principles – aimed ultimately at protecting humankind from rogue AI.
The guidelines dug into the ethics of AI and even included a principle aimed at diverting the Terminator like “arms race in lethal autonomous weapons."
Others were more prosaic. Principle 12 on personal privacy, states, "People should have the right to access, manage and control the data they generate, given AI systems' power to analyze and utilize that data."
Whether commercial organizations will take heed of this as they compete for business is moot. Just don’t mention SkyNet.
— Adrian Pennington is a freelance editor, journalist and copywriter.