NLLanguageRecognizer

One of my favorite activities, when I travel, is to listen to people as they pass and try to guess what language they’re speaking. I’d like to think that I’ve gotten pretty good at it over the years (though I rarely get to know if I guessed right).

If I’m lucky, I’ll recognize a word or phrase as a cognate of a language I’m familiar with, and narrow things down from there. Otherwise, I try to build up a phonetic inventory, listening for what kinds of sounds are present. For instance, is the speaker mostly using voiced alveolar trills ⟨r⟩, flaps ⟨ɾ⟩, or postalveolar approximants ⟨ɹ⟩? Are the vowels mostly open / close; front / back? Any unusual sounds, like ⟨ʇ⟩?

…or at least that’s what I think I do. To be honest, all of this happens unconsciously and automatically – for all of us, and for all manner of language recognition tasks. And have only the faintest idea of how we get from input to output.

Computers operate in a similar manner. After many hours of training, machine learning models can predict the language of text with accuracy far exceeding previous attempts from a formalized top-down approach.

Machine learning has been at the heart of natural language processing in Apple platforms for many years, but it’s only recently that external developers have been able to harness it directly.

New in iOS 12 and macOS 10.14, the Natural Language framework refines existing linguistic APIs and exposes new functionality to developers.

NLTagger is NSLinguisticTagger with a new attitude. NLTokenizer is a replacement for enumerateSubstrings(in:options:using:) (neé CFStringTokenizer). NLLanguageRecognizer offers an extension of the functionality previously exposted through the dominantLanguage in NSLinguisticTagger, with the ability to provide hints and get additional predictions.

Recognizing the Language of Natural Language Text

Here’s how to use NLLanguageRecognizer to guess the dominant language of natural language text:

importNaturalLanguageletstring="""私はガラスを食べられます。それは私を傷つけません。
          """letrecognizer=NLLanguageRecognizer()recognizer.processString(string)recognizer.dominantLanguage// ja

First, create an instance of NLLanguageRecognizer and call the method processString(_:) passing a string. From there, the dominantLanguage property returns an NLLanguage object containing the BCP-47 language tag of the predicted language (for example "ja" for 日本語 / Japanese).

Getting Multiple Language Hypotheses

If you studied linguistics in college or joined the Latin club in high school, you may be familiar with some fun examples of polylingual homonymy between dialectic Latin and modern Italian.

For example, consider the readings of the following sentence:

CANE NERO MAGNA BELLA PERSICA!

Language	Translation
Latin	Sing, o Nero, the great Persian wars!
Italian	The black dog eats a nice peach!

To the chagrin of Max Fisher, Latin isn’t one of the languages supported by NLLanguageRecognizer, so any examples of confusable languages won’t be nearly as entertaining.

With some experimentation, you’ll find that it’s quite difficult to get NLLanguageRecognizer to guess incorrectly, or even with low precision. Beyond giving it a single cognate shared across members of a language family, it’s often able to get past 2σ to 95% certainty with a handful of words.

After some trial and error, we were finally able to get NLLanguageRecognizer to guess incorrectly for a string of non-trivial length by passing the Article I of the Universal Declaration of Human Rights in Norsk, Bokmål:

letstring="""
          Alle mennesker er født frie og med samme menneskeverd og menneskerettigheter.
          De er utstyrt med fornuft og samvittighet og bør handle mot hverandre i brorskapets ånd.
          """letlanguageRecognizer=NLLanguageRecognizer()languageRecognizer.processString(string)recognizer.dominantLanguage// da (!)

The Universal Declaration of Human Rights, is the among the most widely-translated documents in the world, with translations in over 500 different languages. For this reason, it’s often used for natural language tasks.

Danish and Norwegian Bokmål are very similar languages to begin with, so it’s unsurprising that NLLanguageRecognizer guessed incorrectly. (For comparison, here’s the equivalent text in Danish)

We can use the languageHypotheses(withMaximum:) method to get a sense of how confident the dominantLanguage guess was:

languageRecognizer.languageHypotheses(withMaximum:2)

Language	Confidence
Danish (`da`)	56%
Norwegian Bokmål (`nb`)	43%

At the time of writing, the languageHints property is undocumented, so it’s unclear how exactly it should be used. However, passing a weighted dictionary of probabilities seems to have the desired effect of bolstering the hypotheses with known priors:

languageRecognizer.languageHints=[.danish:0.25,.norwegian:0.75]

Language	Confidence (with Hints)
Danish (`da`)	30%
Norwegian Bokmål (`nb`)	70%

So what can you do once you know the language of a string?

Here are a couple of use cases for your consideration:

Checking Misspelled Words

Combine NLLanguageRecognizer with UITextChecker to check the spelling of words in any string:

Start by creating an NLLanguageRecognizer and initializing it with a string by calling the processString(_:) method:

letstring="""
          Wenn ist das Nunstück git und Slotermeyer?
          Ja! Beiherhund das Oder die Flipperwaldt gersput!
          """letlanguageRecognizer=NLLanguageRecognizer()languageRecognizer.processString(string)letdominantLanguage=languageRecognizer.dominantLanguage!// de

Then, pass the rawValue of the NLLanguage object returned by the dominantLanguage property to the language parameter of rangeOfMisspelledWord(in:range:startingAt:wrap:language:):

lettextChecker=UITextChecker()letnsString=NSString(string:string)letstringRange=NSRange(location:0,length:nsString.length)varoffset=0repeat{letwordRange=textChecker.rangeOfMisspelledWord(in:string,range:stringRange,startingAt:offset,wrap:false,language:dominantLanguage.rawValue)guardwordRange.location!=NSNotFoundelse{break}print(nsString.substring(with:wordRange))offset=wordRange.upperBound}whiletrue

When passed the The Funniest Joke in the World, the following words are called out for being misspelled:

Nunstück
Slotermeyer
Beiherhund
Flipperwaldt
gersput

Synthesizing Speech

You can use NLLanguageRecognizer in concert with AVSpeechSynthesizer to hear any natural language text read aloud:

letstring="""
          Je m'baladais sur l'avenue le cœur ouvert à l'inconnu
          J'avais envie de dire bonjour à n'importe qui.
          N'importe qui et ce fut toi, je t'ai dit n'importe quoi
          Il suffisait de te parler, pour t'apprivoiser.
          """letlanguageRecognizer=NLLanguageRecognizer()languageRecognizer.processString(string)letlanguage=languageRecognizer.dominantLanguage!.rawValue// frletspeechSynthesizer=AVSpeechSynthesizer()letutterance=AVSpeechUtterance(string:string)utterance.voice=AVSpeechSynthesisVoice(language:language)speechSynthesizer.speak(utterance)

It doesn’t have the lyrical finesse of Joe Dassin, but ainsi va la vie.

In order to be understood, we first must seek to understand. And the first step to understanding natural language is to determine its language.

NLLanguageRecognizer offers a powerful new interface to functionality that’s been responsible for intelligent features throughout iOS and macOS. See how you might take advantage of it in your app to gain new understanding of your users.

NLLanguageRecognizer

Recognizing the Language of Natural Language Text

Getting Multiple Language Hypotheses

Checking Misspelled Words

Synthesizing Speech

Trending Articles

Scuffham Amps - S-GEAR 2.6.0 VST, AAX, STANDALONE x86 x64 (R2R NO iLok2, +NO...

Practice Sheet of Right form of verbs for HSC Students

VHSE First (1st) Allotment 2025 - vhscap.kerala.gov.in

UNIVERSE LEAGUE – UNIVERSE LEAGUE – WAR (We Are Ready) – EP [iTunes Plus M4A]

City Hunter Teledrama – Episode 18 – 07th May 2016

Comment on Proposed Criteria for Identifying Predatory Conferences by Luke...

Bureau of Internal Revenue: Regional Offices (Directory)

Kendrick Lamar – Not Like Us (2024) [24Bit-88.2kHz] [PMEDIA] ⭐️

Inception 2010 Hindi Dual Audio 650MB BRRip 720p ESubs HEVC

East Hull MD admits sexual assaults after another victim comes forward

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

R. v. Sargeant, 2023 ONSC 6406 (CanLII)

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Who’s been sentenced at Northampton Magistrates’ Court

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Family cries out as traditional ruler allegedly abducts brother, extorts N2.5m

Long-Running Conflict In Springfield (MA) Gangland Sphere Has Manzi Family &...

Wondershare Filmora X v10.1.20.16 x64

Man arrested after fracas in flat

Man charged in ongoing Sexual Assault Investigation Derek Nyilas, 46, Faces...