Strings are a ubiquitous and diverse part of our computing lives. They comprise emails and essays, poems and novels—and indeed, every article on nshipster.com, the configuration files that shape the site, and the code that builds it.
Being able to pull apart strings and extract particular bits of data is therefore a powerful skill, one that we use over and over building apps and shaping our tools. Cocoa provides a powerful set of tools to handle string processing. In particular:
string.componentsSeparatedByCharactersInSet
/string.componentsSeparatedByString
: Great for splitting a string into constituent pieces. Not so great at anything else.NSRegularExpression
: Powerful for validating and extracting string data from an expected format. Cumbersome when dealing with complex serial input and finicky for parsing numeric values.NSDataDetector
: Perfect for detecting and extracting dates, addresses, links, and more. Limited to its predefined types.NSScanner
: Highly configurable and designed for scanning string and numeric values from loosely demarcated strings.
This week’s article focuses on the last of these, NSScanner
. Read on to learn about its flexibility and power.
Among Cocoa’s tools, NSScanner
serves as a wrapper around a string, scanning through its contents to efficiently retrieve substrings and numeric values. It offers several properties that modify an NSScanner
instance’s behavior:
caseSensitive
Bool
: Whether to pay attention to the upper- or lower-case while scanning. Note that this property only applies to the string-matching methodsscanString:intoString:
andscanUpToString:intoString:
—character sets scanning is always case-sensitive.charactersToBeSkipped
NSCharacterSet
: The characters to skip over on the way to finding a match for the requested value type.scanLocation
Int
: The current position of the scanner in its string. Scanning can be rewound or restarted by setting this property.locale
NSLocale
: The locale that the scanner should use when parsing numeric values (see below).
An NSScanner
instance has two additional read-only properties: string
, which gives you back the string the scanner is scanning; and atEnd
, which is true if scanLocation
is at the end of the string.
Note:
NSScanner
is actually the abstract superclass of a private cluster of scanner implementation classes. Even though you’re callingalloc
andinit
onNSScanner
, you’ll actually receive one of these subclasses, such asNSConcreteScanner
. No need to fret over this.
Extracting Substrings and Numeric Values
The raison d'être of NSScanner
is to pull substrings and numeric values from a larger string. It has fifteen methods to do this, all of which follow the same basic pattern. Each method takes a reference to an output variable as a parameter and returns a boolean value indicating success or failure of the scan:
letwhitespaceAndPunctuationSet=NSMutableCharacterSet.whitespaceAndNewlineCharacterSet()whitespaceAndPunctuationSet.formUnionWithCharacterSet(NSCharacterSet.punctuationCharacterSet())letstringScanner=NSScanner(string:"John & Paul & Ringo & George.")stringScanner.charactersToBeSkipped=whitespaceAndPunctuationSet// using the latest Swift 1.2 beta 2 syntax:varname:NSString?whilestringScanner.scanUpToCharactersFromSet(whitespaceAndPunctuationSet,intoString:&name){println(name)}// John// Paul// Ringo// George
NSMutableCharacterSet*whitespaceAndPunctuationSet=[NSMutableCharacterSetpunctuationCharacterSet];[whitespaceAndPunctuationSetformUnionWithCharacterSet:[NSCharacterSetwhitespaceAndNewlineCharacterSet]];NSScanner*stringScanner=[[NSScanneralloc]initWithString:@"John & Paul & Ringo & George."];stringScanner.charactersToBeSkipped=whitespaceAndPunctuationSet;NSString*name;while([stringScannerscanUpToCharactersFromSet:whitespaceAndPunctuationSetintoString:&name]){NSLog(@"%@",name);}// John// Paul// Ringo// George
The NSScanner API has methods for two use-cases: scanning for strings generally, or for numeric types specifically.
1) String Scanners
scanString:intoString:
/scanCharactersFromSet:intoString:
- Scans to match the string parameter or characters in the
NSCharacterSet
parameter, respectively. TheintoString
parameter will return containing the scanned string, if found. These methods are often used to advance the scanner’s location—passnil
for theintoString
parameter to ignore the output.scanUpToString:intoString:
/scanUpToCharactersFromSet:intoString:
- Scans characters into a string until finding the string parameter or characters in the
NSCharacterSet
parameter, respectively. TheintoString
parameter will return containing the scanned string, if any was found. If the given string or character set are not found, the result will be the entire rest of the scanner’s string.
2) Numeric Scanners
scanDouble:
/scanFloat:
/scanDecimal:
- Scans a floating-point value from the scanner’s string and returns the value in the referenced
Double
,Float
, orNSDecimal
instance, if found.scanInteger:
/scanInt:
/scanLongLong:
/scanUnsignedLongLong:
- Scans an integer value from the scanner’s string and returns the value in the referenced
Int
,Int32
,Int64
, orUInt64
instance, if found.scanHexDouble:
/scanHexFloat:
- Scans a hexadecimal floating-point value from the scanner’s string and returns the value in the referenced
Double
orFloat
instance, if found. To scan properly, the floating-point value must have a0x
or0X
prefix.scanHexInt:
/scanHexLongLong:
- Scans a hexadecimal integer value from the scanner’s string and returns the value in the referenced
UInt32
orUInt64
instance, if found. The value may have a0x
or0X
prefix, but it is not required.
localizedScannerWithString / locale
Because it is a part of Cocoa, NSScanner
has built-in localization support (of course). An NSScanner
instance can work with either the user’s locale when created via + localizedScannerWithString:
, or a specific locale after setting its locale
property. In particular, the separator for floating-point values will be correctly interpreted based on the given locale:
varprice=0.0letgasPriceScanner=NSScanner(string:"2.09 per gallon")gasPriceScanner.scanDouble(&price)// 2.09// use a german locale instead of the defaultletbenzinPriceScanner=NSScanner(string:"1,38 pro Liter")benzinPriceScanner.locale=NSLocale(localeIdentifier:"de-DE")benzinPriceScanner.scanDouble(&price)// 1.38
doubleprice;NSScanner*gasPriceScanner=[[NSScanneralloc]initWithString:@"2.09 per gallon"];[gasPriceScannerscanDouble:&price];// 2.09// use a german locale instead of the defaultNSScanner*benzinPriceScanner=[[NSScanneralloc]initWithString:@"1,38 pro Liter"];[benzinPriceScannersetLocale:[NSLocalelocaleWithLocaleIdentifier:@"de-DE"]];[benzinPriceScannerscanDouble:&price];// 1.38
Example: Parsing SVG Path Data
To take NSScanner
out for a spin, we’ll look at parsing the path data from an SVG path. SVG path data are stored as a string of instructions for drawing the path, where “M” indicates a “move-to” step, “L” stands for “line-to”, and “C” stands for a curve. Uppercase instructions are followed by points in absolute coordinates; lowercase instructions are followed by coordinates relative to the last point in the path.
Here’s an SVG path I happen to have lying around (and a point-offsetting helper we’ll use later):
varsvgPathData="M28.2,971.4c-10,0.5-19.1,13.3-28.2,2.1c0,15.1,23.7,30.5,39.8,16.3c16,14.1,39.8-1.3,39.8-16.3c-12.5,15.4-25-14.4-39.8,4.5C35.8,972.7,31.9,971.2,28.2,971.4z"extensionCGPoint{funcoffset(p:CGPoint)->CGPoint{returnCGPoint(x:x+p.x,y:y+p.y)}}
staticNSString*constsvgPathData=@"M28.2,971.4c-10,0.5-19.1,13.3-28.2,2.1c0,15.1,23.7,30.5,39.8,16.3c16,14.1,39.8-1.3,39.8-16.3c-12.5,15.4-25-14.4-39.8,4.5C35.8,972.7,31.9,971.2,28.2,971.4z";CGPointoffsetPoint(CGPointp1,CGPointp2){returnCGPointMake(p1.x+p2.x,p1.y+p2.y);}
Note that the point data are fairly irregular. Sometimes the x
and y
values of a point are separated by a comma, sometimes not, and likewise with points themselves. Parsing these data with regular expressions could turn into a mess pretty quickly, but with NSScanner
the code is clear and straightforward.
We’ll define a bezierPathFromSVGPath
function that will convert a string of path data into an UIBezierPath
. Our scanner is set up to skip commas and whitespace while scanning for values:
funcbezierPathFromSVGPath(str:String)->UIBezierPath{letscanner=NSScanner(string:str)// skip commas and whitespaceletskipChars=NSMutableCharacterSet(charactersInString:",")skipChars.formUnionWithCharacterSet(NSCharacterSet.whitespaceAndNewlineCharacterSet())scanner.charactersToBeSkipped=skipChars// the resulting bezier pathvarpath=UIBezierPath()
-(UIBezierPath*)bezierPathFromSVGPath:(NSString*)str{NSScanner*scanner=[NSScannerscannerWithString:str];// skip commas and whitespaceNSMutableCharacterSet*skipChars=[NSMutableCharacterSetcharacterSetWithCharactersInString:@","];[skipCharsformUnionWithCharacterSet:[NSCharacterSetwhitespaceAndNewlineCharacterSet]];scanner.charactersToBeSkipped=skipChars;// the resulting bezier pathUIBezierPath*path=[UIBezierPathbezierPath];
With the setup out of the way, it’s time to start scanning. We start by scanning for a string made up of characters in the allowed set of instructions:
// instructions code can be upper- or lower-caseletinstructionSet=NSCharacterSet(charactersInString:"MCSQTAmcsqta")varinstruction:NSString?// scan for an instruction codewhilescanner.scanCharactersFromSet(instructionSet,intoString:&instruction){
// instructions codes can be upper- or lower-caseNSCharacterSet*instructionSet=[NSCharacterSetcharacterSetWithCharactersInString:@"MCSQTAmcsqta"];NSString*instruction;// scan for an instruction codewhile([scannerscanCharactersFromSet:instructionSetintoString:&instruction]){
The next section scans for two Double
values in a row, converts them to a CGPoint
, and then ultimately adds the correct step to the bezier path:
varx=0.0,y=0.0varpoints:[CGPoint]=[]// scan for pairs of Double, adding them as CGPoints to the points arraywhilescanner.scanDouble(&x)&&scanner.scanDouble(&y){points.append(CGPoint(x:x,y:y))}// new point for bezier pathswitchinstruction??""{case"M":path.moveToPoint(points[0])case"C":path.addCurveToPoint(points[2],controlPoint1:points[0],controlPoint2:points[1])case"c":path.addCurveToPoint(path.currentPoint.offset(points[2]),controlPoint1:path.currentPoint.offset(points[0]),controlPoint2:path.currentPoint.offset(points[1]))default:break}}returnpath}
doublex,y;NSMutableArray*points=[NSMutableArrayarray];// scan for pairs of Double, adding them as CGPoints to the points arraywhile([scannerscanDouble:&x]&&[scannerscanDouble:&y]){[pointsaddObject:[NSValuevalueWithCGPoint:CGPointMake(x,y)]];}// new point in pathif([instructionisEqualToString:@"M"]){[pathmoveToPoint:[points[0]CGPointValue]];}elseif([instructionisEqualToString:@"C"]){[pathaddCurveToPoint:[points[2]CGPointValue]controlPoint1:[points[0]CGPointValue]controlPoint2:[points[1]CGPointValue]];}elseif([instructionisEqualToString:@"c"]){CGPointnewPoint=offsetPoint(path.currentPoint,[points[2]CGPointValue]);CGPointcontrol1=offsetPoint(path.currentPoint,[points[0]CGPointValue]);CGPointcontrol2=offsetPoint(path.currentPoint,[points[1]CGPointValue]);[pathaddCurveToPoint:newPointcontrolPoint1:control1controlPoint2:control2];}}[pathapplyTransform:CGAffineTransformMakeScale(1,-1)];returnpath;}
Lo and behold, the result:
The required flipping, resizing, waxing, and twirling are left as an exercise for the reader.
Swift-Friendly Scanning
As a last note, working with NSScanner
in Swift can feel almost silly. Really, NSScanner
, I need to pass in a pointer just so you can return a Bool
? I can’t use optionals, which are pretty much designed for this exact purpose? Really?
With a simple extension converting the built-in methods to ones returning optional values, scanning becomes far more in sync with Swift’s idiom. Our path data scanning example can now use optional binding instead of inout
variables for a cleaner, easier-to-read implementation:
// look for an instruction codewhileletinstruction=scanner.scanCharactersFromSet(instructionSet){varpoints:[CGPoint]=[]// scan for pairs of Double, adding them as CGPoints to the points arraywhileletx=scanner.scanDouble(),y=scanner.scanDouble(){points.append(CGPoint(x:x,y:y))}// new point for bezier pathswitchinstruction{// ...}}
You’ve gotta have the right tools for every job. NSScanner
can be the shiny tool to reach for when it’s time to parse a user’s input or a web service’s data. Being able to distinguish which tools are right for which tasks helps us on our way to creating clear and accurate code.