Trimming (computer programming): Difference between revisions
m Reverted edit by Funkywizard (talk) to last version by Monkbot |
|||
(22 intermediate revisions by 18 users not shown) | |||
Line 1: | Line 1: | ||
{{ |
{{Refimprove|date=February 2015}} |
||
{{bare URLs|date=June 2013}} |
|||
In [[computer programming]], '''trimming''' ('''trim''') or '''stripping''' ('''strip''') is a [[string manipulation]] in which leading and trailing [[whitespace |
In [[computer programming]], '''trimming''' ('''trim''') or '''stripping''' ('''strip''') is a [[string (computer science)|string manipulation]] in which leading and trailing [[whitespace character|whitespace]] is removed from a [[string (computer science)|string]]. |
||
For example, the string (enclosed by apostrophes) |
For example, the string (enclosed by apostrophes) |
||
< |
<syntaxhighlight lang=text>' this is a test '</syntaxhighlight> |
||
would be changed, after trimming, to |
would be changed, after trimming, to |
||
< |
<syntaxhighlight lang=text>'this is a test'</syntaxhighlight> |
||
==Variants== |
==Variants== |
||
;Left or right trimming |
|||
:The most popular variants of the trim function strip only the beginning or end of the string. Typically named '''ltrim''' and '''rtrim''' respectively, or in the case of Python: '''lstrip''' and '''rstrip'''. C# uses '''TrimStart''' and '''TrimEnd''', and Common Lisp '''string-left-trim''' and '''string-right-trim'''. Pascal and Java do not have these variants built-in, although [[Object Pascal]] (Delphi) has '''TrimLeft''' and '''TrimRight''' functions.<ref>{{cite web|url=http://www.freepascal.org/docs-html/rtl/sysutils/trim.html |title=Trim |publisher=Freepascal.org |date=2013-02-02 |accessdate=2013-08-24}}</ref> |
|||
===Left or right trimming=== |
|||
;Whitespace character list parameterization |
|||
:Many trim functions have an optional parameter to specify a list of characters to trim, instead of the default whitespace characters. For example, PHP and Python allow this optional parameter, while Pascal and Java do not. With Common Lisp's <code>string-trim</code> function, the parameter (called ''character-bag'') is required. The C++ [[Boost library]] defines space characters according to [[locale]], as well as offering variants with a [[predicate (computer programming)|predicate]] parameter (a [[functor]]) to select which characters are trimmed. |
|||
The most popular variants of the trim function strip only the beginning or end of the string. Typically named '''ltrim''' and '''rtrim''' respectively, or in the case of Python: '''lstrip''' and '''rstrip'''. C# uses '''TrimStart''' and '''TrimEnd''', and Common Lisp '''string-left-trim''' and '''string-right-trim'''. Pascal and Java do not have these variants built-in, although [[Object Pascal]] (Delphi) has '''TrimLeft''' and '''TrimRight''' functions.<ref>{{cite web|url=http://www.freepascal.org/docs-html/rtl/sysutils/trim.html |title=Trim |publisher=Freepascal.org |date=2013-02-02 |access-date=2013-08-24}}</ref> |
|||
;Special empty string return value |
|||
:An uncommon variant of trim returns a special result if no characters remain after the trim operation. For example, [[Jakarta Project|Apache Jakarta]]'s '''StringUtils''' has a function called <code>stripToNull</code> which returns <code>null</code> in place of an empty string. |
|||
===Whitespace character list parameterization=== |
|||
;Space normalization |
|||
:Space normalization is a related string manipulation where in addition to removing surrounding whitespace, any sequence of whitespace characters within the string is replaced with a single space. Space normalization is performed by the function named <code>Trim()</code> in spreadsheet applications (including [[Microsoft Excel|Excel]], [[OpenOffice.org Calc|Calc]], [[Gnumeric]], and [[Google Docs]]), and by the <code>normalize-space()</code> function in [[XSL Transformations|XSLT]] and [[XPath]], |
|||
Many trim functions have an optional parameter to specify a list of characters to trim, instead of the default whitespace characters. For example, PHP and Python allow this optional parameter, while Pascal and Java do not. With Common Lisp's <code>string-trim</code> function, the parameter (called ''character-bag'') is required. The C++ [[Boost library]] defines space characters according to [[Locale (computer software)|locale]], as well as offering variants with a [[predicate (computer programming)|predicate]] parameter (a [[functor]]) to select which characters are trimmed. |
|||
;In-place trimming |
|||
:While most algorithms return a new (trimmed) string, some alter the original string [[in-place]]. Notably, the [[Boost library]] allows either in-place trimming or a trimmed copy to be returned. |
|||
===Special empty string return value=== |
|||
==Definition of whitespace== |
|||
The characters which are considered whitespace varies between programming languages and implementations. For example, C traditionally only counts space, tab, line feed, and carriage return characters, while languages which support [[Unicode]] typically include all Unicode space characters. Some implementations also include [[ASCII]] control codes (non-printing characters) along with whitespace characters. |
|||
An uncommon variant of trim returns a special result if no characters remain after the trim operation. For example, [[Jakarta Project|Apache Jakarta]]'s '''StringUtils''' has a function called <code>stripToNull</code> which returns <code>null</code> in place of an empty string. |
|||
Java's trim method considers ASCII spaces and control codes as whitespace, contrasting with the Java <code>isWhitespace()</code> method,<ref>{{cite web|url=http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Character.html#isWhitespace(char) |title=Character (Java 2 Platform SE 5.0) |publisher=Java.sun.com |date= |accessdate=2013-08-24}}</ref> which recognizes all Unicode space characters. |
|||
===Space normalization=== |
|||
Delphi's Trim function considers characters U+0000 (NULL) through U+0020 (SPACE) to be whitespace. |
|||
Space normalization is a related string manipulation where in addition to removing surrounding whitespace, any sequence of whitespace characters within the string is replaced with a single space. Space normalization is performed by the function named <code>Trim()</code> in spreadsheet applications (including [[Microsoft Excel|Excel]], [[OpenOffice.org Calc|Calc]], [[Gnumeric]], and [[Google Docs]]), and by the <code>normalize-space()</code> function in [[XSL Transformations|XSLT]] and [[XPath]], |
|||
==Usage== |
|||
Following are examples of trimming a string using several programming languages. All of the implementations shown return a new string and do not alter the original variable. |
|||
===In-place trimming=== |
|||
{| class="wikitable" |
|||
|- style="text-align:left;" |
|||
! Example usage !! Languages |
|||
|- |
|||
| <tt>''String''.Trim([''chars''])</tt> |
|||
|[[C Sharp (programming language)|C#]], [[VB.NET]], [[Windows PowerShell]] |
|||
|- |
|||
| <tt>''string''.strip();</tt> |
|||
|[[D (programming language)|D]] |
|||
|- |
|||
| <tt>(.trim ''string'')</tt> |
|||
|[[Clojure]] |
|||
|- |
|||
|<tt>''sequence'' [ predicate? ] trim</tt> |
|||
|[[Factor (programming language)|Factor]] |
|||
|- |
|||
| <tt>(string-trim '(#\Space #\Tab #\Newline) ''string'')</tt> |
|||
|[[Common Lisp]] |
|||
|- |
|||
| <tt>(string-trim ''string'')</tt> |
|||
|[[Scheme (programming language)|Scheme]] |
|||
|- |
|||
| <tt>''string''.trim()</tt> |
|||
|[[Java (programming language)|Java]], [[JavaScript]] (1.8.1+, Firefox 3.5+) |
|||
|- |
|||
| <tt>Trim(''String'')</tt> |
|||
|[[Pascal (programming language)|Pascal]],<ref>{{cite web|url=http://gnu-pascal.de/gpc-hr/Trim.html |title=Trim - GNU Pascal priručnik |publisher=Gnu-pascal.de |date= |accessdate=2013-08-24}}</ref> [[QBasic]], [[Visual Basic]], [[Delphi programming language|Delphi]] |
|||
|- |
|||
| <tt>''string''.strip()</tt> |
|||
|[[Python (programming language)|Python]] |
|||
|- |
|||
| <tt>strings.Trim(''string'', ''chars'')</tt> |
|||
|[[Go (programming language)|Go]] |
|||
|- |
|||
| <tt>LTRIM(RTRIM(''String''))</tt> |
|||
|[[Oracle Corporation|Oracle]] [[SQL]], [[T-SQL]] |
|||
|- |
|||
| <tt>strip(''string'' [,''option'', ''char''])</tt> |
|||
|[[REXX (programming language)|REXX]] |
|||
|- |
|||
| <tt>string:strip(''string'' [,''option'', ''char''])</tt> |
|||
|[[Erlang (programming language)|Erlang]] |
|||
|- |
|||
| <tt>''string''.strip</tt> |
|||
|[[Ruby (programming language)|Ruby]] |
|||
|- |
|||
| <tt>''string'' =~ s/^\s+//r =~ s/\s+$//r</tt> |
|||
|[[Perl 5]] |
|||
|- |
|||
| <tt>''string''.trim</tt> |
|||
|[[Perl 6]] |
|||
|- |
|||
| <tt>trim(''string'')</tt> |
|||
|[[PHP]] |
|||
|- |
|||
| <tt>[''string'' stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]]</tt> |
|||
|[[Objective-C]] using [[Cocoa (API)|Cocoa]] |
|||
|- |
|||
| <tt>''string'' withBlanksTrimmed<br>''string'' withoutSpaces<br>''string'' withoutSeparators</tt> |
|||
|[[Smalltalk]] (Squeak, Pharo)<br>[[Smalltalk]] |
|||
|- |
|||
|<tt>strip(string)</tt> |
|||
|[[SAS System|SAS]] |
|||
|- |
|||
|<tt>string trim ''$string''</tt> |
|||
|[[Tcl]] |
|||
|- |
|||
| <tt>TRIM(''string'') or TRIM(ADJUSTL(''string''))</tt> |
|||
|[[Fortran]] |
|||
|- |
|||
| <tt>TRIM(''string'')</tt> |
|||
|[[SQL]] |
|||
|- |
|||
| <tt>TRIM(''string'') or LTrim(''string'') or RTrim(''String'')</tt> |
|||
|[[ColdFusion]] |
|||
|- |
|||
| <tt>String.trim ''string''</tt> |
|||
|[[OCaml]] 4+ |
|||
|} |
|||
While most algorithms return a new (trimmed) string, some alter the original string [[in-place]]. Notably, the [[Boost library]] allows either in-place trimming or a trimmed copy to be returned. |
|||
===Other languages=== |
|||
In languages without a built-in trim function, it is usually simple to create a custom function which accomplishes the same task. |
|||
==Definition of whitespace== |
|||
====AWK==== |
|||
The characters which are considered whitespace varies between programming languages and implementations. For example, C traditionally only counts space, tab, line feed, and carriage return characters, while languages which support [[Unicode]] typically include all Unicode space characters. Some implementations also include [[ASCII]] control codes (non-printing characters) along with whitespace characters. |
|||
In [[AWK programming language|AWK]], one can use regular expressions to trim: |
|||
Java's trim method considers ASCII spaces and control codes as whitespace, contrasting with the Java <code>isWhitespace()</code> method,<ref>{{cite web|url=http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Character.html#isWhitespace(char) |title=Character (Java 2 Platform SE 5.0) |publisher=Java.sun.com |access-date=2013-08-24}}</ref> which recognizes all Unicode space characters. |
|||
<source lang="awk"> |
|||
ltrim(v) = gsub(/^[ \t]+/, "", v) |
|||
rtrim(v) = gsub(/[ \t]+$/, "", v) |
|||
trim(v) = ltrim(v); rtrim(v) |
|||
</source> |
|||
Delphi's Trim function considers characters U+0000 (NULL) through U+0020 (SPACE) to be whitespace. |
|||
or: |
|||
=== Non-space blanks === |
|||
<source lang="awk"> |
|||
The [[Braille Patterns]] Unicode block contains {{unichar|2800|Braille pattern blank|html=}}, a [[Braille]] pattern with no dots raised. |
|||
function ltrim(s) { sub(/^[ \t]+/, "", s); return s } |
|||
The Unicode standard explicitly states that it does not act as a space. |
|||
function rtrim(s) { sub(/[ \t]+$/, "", s); return s } |
|||
function trim(s) { return rtrim(ltrim(s)); } |
|||
</source> |
|||
The [[Non-breaking space]] {{unichar|00A0|NO-BREAK SPACE|html=}} can also be treated as non-space for trimming purposes. |
|||
====C/C++==== |
|||
There is no standard trim function in C or C++. Most of the available string libraries<ref>{{cite web|url=http://www.and.org/vstr/comparison |title=String library comparison |publisher=And.org |date= |accessdate=2013-08-24}}</ref> for C contain code which implements trimming, or functions that significantly ease an efficient implementation. The function has also often been called '''EatWhitespace''' in some non-standard C libraries. |
|||
==Usage== |
|||
In C, programmers often combine a ltrim and rtrim to implement trim: |
|||
{{Main|Comparison of programming languages (string functions)#trim}} |
|||
<source lang="C"> |
|||
#include <string.h> |
|||
#include <ctype.h> |
|||
void rtrim(char *str) |
|||
{ |
|||
size_t n; |
|||
n = strlen(str); |
|||
while (n > 0 && isspace((unsigned char)str[n - 1])) { |
|||
n--; |
|||
} |
|||
str[n] = '\0'; |
|||
} |
|||
void ltrim(char *str) |
|||
{ |
|||
size_t n; |
|||
n = 0; |
|||
while (str[n] != '\0' && isspace((unsigned char)str[n])) { |
|||
n++; |
|||
} |
|||
memmove(str, str + n, strlen(str) - n + 1); |
|||
} |
|||
void trim(char *str) |
|||
{ |
|||
rtrim(str); |
|||
ltrim(str); |
|||
} |
|||
</source> |
|||
The [[open-source software|open source]] C++ library [[Boost library|Boost]] has several trim variants, including a standard one:<ref>{{cite web|url=http://www.boost.org/doc/html/string_algo/usage.html#id2742817 |title=Usage - 1.54.0 |publisher=Boost.org |date=2013-05-22 |accessdate=2013-08-24}}</ref> |
|||
<source lang="cpp"> |
|||
#include <boost/algorithm/string/trim.hpp> |
|||
trimmed = boost::algorithm::trim_copy("string"); |
|||
</source> |
|||
Note that with boost's function named simply <code>trim</code> the input sequence is modified in-place,<ref>[http://www.boost.org/doc/html/trim.html ]{{dead link|date=August 2013}}</ref> and does not return a result. |
|||
Another [[open-source software|open source]] C++ library [[Qt (toolkit)|Qt]] has several trim variants, including a standard one:<ref>http://doc.trolltech.com/4.5/qstring.html#trimmed</ref> |
|||
<source lang="cpp-qt"> |
|||
#include <QString> |
|||
trimmed = s.trimmed(); |
|||
</source> |
|||
The [[Linux kernel]] also includes a strip function, <code>strstrip()</code>, since 2.6.18-rc1, which trims the string "in place". Since 2.6.33-rc1, the kernel uses <code>strim()</code> instead of <code>strstrip()</code> to avoid false warnings.<ref>[http://www.kernel.org/pub/linux/kernel/v2.6/snapshots/patch-2.6.33-rc1-git1.log ]{{dead link|date=August 2013}}</ref> |
|||
====Haskell==== |
|||
A trim algorithm in [[Haskell (programming language)|Haskell]]: |
|||
<source lang="haskell"> |
|||
import Data.Char (isSpace) |
|||
trim :: String -> String |
|||
trim = f . f |
|||
where f = reverse . dropWhile isSpace |
|||
</source> |
|||
may be interpreted as follows: ''f'' drops the preceding whitespace, and reverses the string. ''f'' is then again applied to its own output. Note that the type signature (the second line) is optional. |
|||
====J==== |
|||
The trim algorithm in [[J (programming language)|J]] is a [[functional programming|functional]] description: |
|||
<source lang="J"> |
|||
trim =. #~ [: (+./\ *. +./\.) ' '&~: |
|||
</source> |
|||
That is: filter (<code>#~</code>) for non-space characters (<code>' '&~:</code>) between leading (<code>+./\</code>) and (<code>*.</code>) trailing (<code>+./\.</code>) spaces. |
|||
====JavaScript==== |
|||
There is a built-in trim function in JavaScript 1.8.1 (Firefox 3.5 and later), and the ECMAScript 5 standard. In earlier versions it can be added to the String object's prototype as follows: |
|||
<source lang="javascript"> |
|||
String.prototype.trim = function() { |
|||
return this.replace(/^\s+/g, "").replace(/\s+$/g, ""); |
|||
}; |
|||
</source> |
|||
====Perl==== |
|||
Perl 5 has no built-in trim function. However, the functionality is commonly achieved using [[regular expression]]s. |
|||
Example: |
|||
<source lang="perl"> |
|||
$string =~ s/^\s+//; # remove leading whitespace |
|||
$string =~ s/\s+$//; # remove trailing whitespace |
|||
</source> |
|||
or: |
|||
<source lang="perl"> |
|||
$string =~ s/^\s+|\s+$//g ; # remove both leading and trailing whitespace |
|||
</source> |
|||
These examples modify the value of the original variable <code>$string</code>. |
|||
Also available for Perl is '''StripLTSpace''' in <code>String::Strip</code> from [[CPAN]]. |
|||
There are, however, two functions that are commonly used to strip whitespace from the end of strings, <code>chomp</code> and <code>chop</code>: |
|||
* <code>[http://perldoc.perl.org/functions/chop.html chop]</code> removes the last character from a string and returns it. |
|||
* <code>[http://perldoc.perl.org/functions/chomp.html chomp]</code> removes the trailing newline character(s) from a string if present. (What constitutes a newline is [http://perldoc.perl.org/perlvar.html $INPUT_RECORD_SEPARATOR] dependent). |
|||
In [[Perl 6]], the upcoming major revision of the language, strings have a <code>trim</code> method. |
|||
Example: |
|||
<source lang="perl"> |
|||
$string = $string.trim; # remove leading and trailing whitespace |
|||
$string .= trim; # same thing |
|||
</source> |
|||
====Tcl==== |
|||
The [[Tcl]] <code>string</code> command has three relevant subcommands: <code>trim</code>, <code>trimright</code> and <code>trimleft</code>. For each of those commands, an additional argument may be specified: a string that represents a set of characters to remove—the default is whitespace (space, tab, newline, carriage return). |
|||
Example of trimming vowels: |
|||
<source lang="tcl"> |
|||
set string onomatopoeia |
|||
set trimmed [string trim $string aeiou] ;# result is nomatop |
|||
set r_trimmed [string trimright $string aeiou] ;# result is onomatop |
|||
set l_trimmed [string trimleft $string aeiou] ;# result is nomatopoeia |
|||
</source> |
|||
====XSLT==== |
|||
[[XSL Transformations|XSLT]] includes the function <code>normalize-space(''string'')</code> which strips leading and trailing whitespace, in addition to replacing any whitespace sequence (including line breaks) with a single space. |
|||
Example: |
|||
<source lang="xml"> |
|||
<xsl:variable name='trimmed'> |
|||
<xsl:value-of select='normalize-space(string)'/> |
|||
</xsl:variable> |
|||
</source> |
|||
XSLT 2.0 includes regular expressions, providing another mechanism to perform string trimming. |
|||
Another XSLT technique for trimming is to utilize the XPath 2.0 <code>substring()</code> function. |
|||
==See also== |
|||
*[[Comparison of programming languages (string functions)]] |
|||
==References== |
==References== |
||
Line 285: | Line 55: | ||
*[http://www.tcl.tk/man/tcl8.4/TclCmd/string.htm#M46 Tcl: string trim] |
*[http://www.tcl.tk/man/tcl8.4/TclCmd/string.htm#M46 Tcl: string trim] |
||
*[http://blog.stevenlevithan.com/archives/faster-trim-javascript Faster JavaScript Trim] - compares various JavaScript trim implementations |
*[http://blog.stevenlevithan.com/archives/faster-trim-javascript Faster JavaScript Trim] - compares various JavaScript trim implementations |
||
*[http://webwidetutor.com/php/ |
*[http://webwidetutor.com/php/PHP-Change-String-value-behaviour-or-look-?id=8 php string cut and trimming]- php string cut and trimming |
||
[[Category:Articles with example code]] |
[[Category:Articles with example code]] |
||
[[Category:Programming language comparisons]] |
|||
[[Category:String (computer science)]] |
[[Category:String (computer science)]] |
Latest revision as of 10:58, 3 December 2023
This article needs additional citations for verification. (February 2015) |
In computer programming, trimming (trim) or stripping (strip) is a string manipulation in which leading and trailing whitespace is removed from a string.
For example, the string (enclosed by apostrophes)
' this is a test '
would be changed, after trimming, to
'this is a test'
Variants
[edit]Left or right trimming
[edit]The most popular variants of the trim function strip only the beginning or end of the string. Typically named ltrim and rtrim respectively, or in the case of Python: lstrip and rstrip. C# uses TrimStart and TrimEnd, and Common Lisp string-left-trim and string-right-trim. Pascal and Java do not have these variants built-in, although Object Pascal (Delphi) has TrimLeft and TrimRight functions.[1]
Whitespace character list parameterization
[edit]Many trim functions have an optional parameter to specify a list of characters to trim, instead of the default whitespace characters. For example, PHP and Python allow this optional parameter, while Pascal and Java do not. With Common Lisp's string-trim
function, the parameter (called character-bag) is required. The C++ Boost library defines space characters according to locale, as well as offering variants with a predicate parameter (a functor) to select which characters are trimmed.
Special empty string return value
[edit]An uncommon variant of trim returns a special result if no characters remain after the trim operation. For example, Apache Jakarta's StringUtils has a function called stripToNull
which returns null
in place of an empty string.
Space normalization
[edit]Space normalization is a related string manipulation where in addition to removing surrounding whitespace, any sequence of whitespace characters within the string is replaced with a single space. Space normalization is performed by the function named Trim()
in spreadsheet applications (including Excel, Calc, Gnumeric, and Google Docs), and by the normalize-space()
function in XSLT and XPath,
In-place trimming
[edit]While most algorithms return a new (trimmed) string, some alter the original string in-place. Notably, the Boost library allows either in-place trimming or a trimmed copy to be returned.
Definition of whitespace
[edit]The characters which are considered whitespace varies between programming languages and implementations. For example, C traditionally only counts space, tab, line feed, and carriage return characters, while languages which support Unicode typically include all Unicode space characters. Some implementations also include ASCII control codes (non-printing characters) along with whitespace characters.
Java's trim method considers ASCII spaces and control codes as whitespace, contrasting with the Java isWhitespace()
method,[2] which recognizes all Unicode space characters.
Delphi's Trim function considers characters U+0000 (NULL) through U+0020 (SPACE) to be whitespace.
Non-space blanks
[edit]The Braille Patterns Unicode block contains U+2800 ⠀ BRAILLE PATTERN BLANK, a Braille pattern with no dots raised. The Unicode standard explicitly states that it does not act as a space.
The Non-breaking space U+00A0 NO-BREAK SPACE ( ,  ) can also be treated as non-space for trimming purposes.
Usage
[edit]References
[edit]- ^ "Trim". Freepascal.org. 2013-02-02. Retrieved 2013-08-24.
- ^ "Character (Java 2 Platform SE 5.0)". Java.sun.com. Retrieved 2013-08-24.
External links
[edit]- Tcl: string trim
- Faster JavaScript Trim - compares various JavaScript trim implementations
- php string cut and trimming- php string cut and trimming