Jump to content

Trimming (computer programming): Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
m Reverted edit by Funkywizard (talk) to last version by Monkbot
 
(22 intermediate revisions by 18 users not shown)
Line 1: Line 1:
{{Manual|date=February 2009}}
{{Refimprove|date=February 2015}}
{{bare URLs|date=June 2013}}


In [[computer programming]], '''trimming''' ('''trim''') or '''stripping''' ('''strip''') is a [[string manipulation]] in which leading and trailing [[whitespace (computer science)|whitespace]] is removed from a [[string (computer science)|string]].
In [[computer programming]], '''trimming''' ('''trim''') or '''stripping''' ('''strip''') is a [[string (computer science)|string manipulation]] in which leading and trailing [[whitespace character|whitespace]] is removed from a [[string (computer science)|string]].


For example, the string (enclosed by apostrophes)
For example, the string (enclosed by apostrophes)


<source lang=text>' this is a test '</source>
<syntaxhighlight lang=text>' this is a test '</syntaxhighlight>


would be changed, after trimming, to
would be changed, after trimming, to


<source lang=text>'this is a test'</source>
<syntaxhighlight lang=text>'this is a test'</syntaxhighlight>


==Variants==
==Variants==
;Left or right trimming
:The most popular variants of the trim function strip only the beginning or end of the string. Typically named '''ltrim''' and '''rtrim''' respectively, or in the case of Python: '''lstrip''' and '''rstrip'''. C# uses '''TrimStart''' and '''TrimEnd''', and Common Lisp '''string-left-trim''' and '''string-right-trim'''. Pascal and Java do not have these variants built-in, although [[Object Pascal]] (Delphi) has '''TrimLeft''' and '''TrimRight''' functions.<ref>{{cite web|url=http://www.freepascal.org/docs-html/rtl/sysutils/trim.html |title=Trim |publisher=Freepascal.org |date=2013-02-02 |accessdate=2013-08-24}}</ref>


===Left or right trimming===
;Whitespace character list parameterization
:Many trim functions have an optional parameter to specify a list of characters to trim, instead of the default whitespace characters. For example, PHP and Python allow this optional parameter, while Pascal and Java do not. With Common Lisp's <code>string-trim</code> function, the parameter (called ''character-bag'') is required. The C++ [[Boost library]] defines space characters according to [[locale]], as well as offering variants with a [[predicate (computer programming)|predicate]] parameter (a [[functor]]) to select which characters are trimmed.


The most popular variants of the trim function strip only the beginning or end of the string. Typically named '''ltrim''' and '''rtrim''' respectively, or in the case of Python: '''lstrip''' and '''rstrip'''. C# uses '''TrimStart''' and '''TrimEnd''', and Common Lisp '''string-left-trim''' and '''string-right-trim'''. Pascal and Java do not have these variants built-in, although [[Object Pascal]] (Delphi) has '''TrimLeft''' and '''TrimRight''' functions.<ref>{{cite web|url=http://www.freepascal.org/docs-html/rtl/sysutils/trim.html |title=Trim |publisher=Freepascal.org |date=2013-02-02 |access-date=2013-08-24}}</ref>
;Special empty string return value
:An uncommon variant of trim returns a special result if no characters remain after the trim operation. For example, [[Jakarta Project|Apache Jakarta]]'s '''StringUtils''' has a function called <code>stripToNull</code> which returns <code>null</code> in place of an empty string.


===Whitespace character list parameterization===
;Space normalization
:Space normalization is a related string manipulation where in addition to removing surrounding whitespace, any sequence of whitespace characters within the string is replaced with a single space. Space normalization is performed by the function named <code>Trim()</code> in spreadsheet applications (including [[Microsoft Excel|Excel]], [[OpenOffice.org Calc|Calc]], [[Gnumeric]], and [[Google Docs]]), and by the <code>normalize-space()</code> function in [[XSL Transformations|XSLT]] and [[XPath]],


Many trim functions have an optional parameter to specify a list of characters to trim, instead of the default whitespace characters. For example, PHP and Python allow this optional parameter, while Pascal and Java do not. With Common Lisp's <code>string-trim</code> function, the parameter (called ''character-bag'') is required. The C++ [[Boost library]] defines space characters according to [[Locale (computer software)|locale]], as well as offering variants with a [[predicate (computer programming)|predicate]] parameter (a [[functor]]) to select which characters are trimmed.
;In-place trimming
:While most algorithms return a new (trimmed) string, some alter the original string [[in-place]]. Notably, the [[Boost library]] allows either in-place trimming or a trimmed copy to be returned.


===Special empty string return value===
==Definition of whitespace==
The characters which are considered whitespace varies between programming languages and implementations. For example, C traditionally only counts space, tab, line feed, and carriage return characters, while languages which support [[Unicode]] typically include all Unicode space characters. Some implementations also include [[ASCII]] control codes (non-printing characters) along with whitespace characters.


An uncommon variant of trim returns a special result if no characters remain after the trim operation. For example, [[Jakarta Project|Apache Jakarta]]'s '''StringUtils''' has a function called <code>stripToNull</code> which returns <code>null</code> in place of an empty string.
Java's trim method considers ASCII spaces and control codes as whitespace, contrasting with the Java <code>isWhitespace()</code> method,<ref>{{cite web|url=http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Character.html#isWhitespace(char) |title=Character (Java 2 Platform SE 5.0) |publisher=Java.sun.com |date= |accessdate=2013-08-24}}</ref> which recognizes all Unicode space characters.


===Space normalization===
Delphi's Trim function considers characters U+0000 (NULL) through U+0020 (SPACE) to be whitespace.


Space normalization is a related string manipulation where in addition to removing surrounding whitespace, any sequence of whitespace characters within the string is replaced with a single space. Space normalization is performed by the function named <code>Trim()</code> in spreadsheet applications (including [[Microsoft Excel|Excel]], [[OpenOffice.org Calc|Calc]], [[Gnumeric]], and [[Google Docs]]), and by the <code>normalize-space()</code> function in [[XSL Transformations|XSLT]] and [[XPath]],
==Usage==
Following are examples of trimming a string using several programming languages. All of the implementations shown return a new string and do not alter the original variable.


===In-place trimming===
{| class="wikitable"
|- style="text-align:left;"
! Example usage !! Languages
|-
| <tt>''String''.Trim([''chars''])</tt>
|[[C Sharp (programming language)|C#]], [[VB.NET]], [[Windows PowerShell]]
|-
| <tt>''string''.strip();</tt>
|[[D (programming language)|D]]
|-
| <tt>(.trim ''string'')</tt>
|[[Clojure]]
|-
|<tt>''sequence'' [ predicate? ] trim</tt>
|[[Factor (programming language)|Factor]]
|-
| <tt>(string-trim '(#\Space #\Tab #\Newline) ''string'')</tt>
|[[Common Lisp]]
|-
| <tt>(string-trim ''string'')</tt>
|[[Scheme (programming language)|Scheme]]
|-
| <tt>''string''.trim()</tt>
|[[Java (programming language)|Java]], [[JavaScript]] (1.8.1+, Firefox 3.5+)
|-
| <tt>Trim(''String'')</tt>
|[[Pascal (programming language)|Pascal]],<ref>{{cite web|url=http://gnu-pascal.de/gpc-hr/Trim.html |title=Trim - GNU Pascal priručnik |publisher=Gnu-pascal.de |date= |accessdate=2013-08-24}}</ref> [[QBasic]], [[Visual Basic]], [[Delphi programming language|Delphi]]
|-
| <tt>''string''.strip()</tt>
|[[Python (programming language)|Python]]
|-
| <tt>strings.Trim(''string'', ''chars'')</tt>
|[[Go (programming language)|Go]]
|-
| <tt>LTRIM(RTRIM(''String''))</tt>
|[[Oracle Corporation|Oracle]] [[SQL]], [[T-SQL]]
|-
| <tt>strip(''string'' [,''option'', ''char''])</tt>
|[[REXX (programming language)|REXX]]
|-
| <tt>string:strip(''string'' [,''option'', ''char''])</tt>
|[[Erlang (programming language)|Erlang]]
|-
| <tt>''string''.strip</tt>
|[[Ruby (programming language)|Ruby]]
|-
| <tt>''string'' =~ s/^\s+//r =~ s/\s+$//r</tt>
|[[Perl 5]]
|-
| <tt>''string''.trim</tt>
|[[Perl 6]]
|-
| <tt>trim(''string'')</tt>
|[[PHP]]
|-
| <tt>[''string'' stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]]</tt>
|[[Objective-C]] using [[Cocoa (API)|Cocoa]]
|-
| <tt>''string'' withBlanksTrimmed<br>''string'' withoutSpaces<br>''string'' withoutSeparators</tt>
|[[Smalltalk]] (Squeak, Pharo)<br>[[Smalltalk]]
|-
|<tt>strip(string)</tt>
|[[SAS System|SAS]]
|-
|<tt>string trim ''$string''</tt>
|[[Tcl]]
|-
| <tt>TRIM(''string'') or TRIM(ADJUSTL(''string''))</tt>
|[[Fortran]]
|-
| <tt>TRIM(''string'')</tt>
|[[SQL]]
|-
| <tt>TRIM(''string'') or LTrim(''string'') or RTrim(''String'')</tt>
|[[ColdFusion]]
|-
| <tt>String.trim ''string''</tt>
|[[OCaml]] 4+
|}


While most algorithms return a new (trimmed) string, some alter the original string [[in-place]]. Notably, the [[Boost library]] allows either in-place trimming or a trimmed copy to be returned.
===Other languages===
In languages without a built-in trim function, it is usually simple to create a custom function which accomplishes the same task.


==Definition of whitespace==
====AWK====
The characters which are considered whitespace varies between programming languages and implementations. For example, C traditionally only counts space, tab, line feed, and carriage return characters, while languages which support [[Unicode]] typically include all Unicode space characters. Some implementations also include [[ASCII]] control codes (non-printing characters) along with whitespace characters.
In [[AWK programming language|AWK]], one can use regular expressions to trim:


Java's trim method considers ASCII spaces and control codes as whitespace, contrasting with the Java <code>isWhitespace()</code> method,<ref>{{cite web|url=http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Character.html#isWhitespace(char) |title=Character (Java 2 Platform SE 5.0) |publisher=Java.sun.com |access-date=2013-08-24}}</ref> which recognizes all Unicode space characters.
<source lang="awk">
ltrim(v) = gsub(/^[ \t]+/, "", v)
rtrim(v) = gsub(/[ \t]+$/, "", v)
trim(v) = ltrim(v); rtrim(v)
</source>


Delphi's Trim function considers characters U+0000 (NULL) through U+0020 (SPACE) to be whitespace.
or:


=== Non-space blanks ===
<source lang="awk">
The [[Braille Patterns]] Unicode block contains {{unichar|2800|Braille pattern blank|html=}}, a [[Braille]] pattern with no dots raised.
function ltrim(s) { sub(/^[ \t]+/, "", s); return s }
The Unicode standard explicitly states that it does not act as a space.
function rtrim(s) { sub(/[ \t]+$/, "", s); return s }
function trim(s) { return rtrim(ltrim(s)); }
</source>


The [[Non-breaking space]] {{unichar|00A0|NO-BREAK SPACE|html=}} can also be treated as non-space for trimming purposes.
====C/C++====
There is no standard trim function in C or C++. Most of the available string libraries<ref>{{cite web|url=http://www.and.org/vstr/comparison |title=String library comparison |publisher=And.org |date= |accessdate=2013-08-24}}</ref> for C contain code which implements trimming, or functions that significantly ease an efficient implementation. The function has also often been called '''EatWhitespace''' in some non-standard C libraries.


==Usage==
In C, programmers often combine a ltrim and rtrim to implement trim:
{{Main|Comparison of programming languages (string functions)#trim}}

<source lang="C">
#include <string.h>
#include <ctype.h>

void rtrim(char *str)
{
size_t n;
n = strlen(str);
while (n > 0 && isspace((unsigned char)str[n - 1])) {
n--;
}
str[n] = '\0';
}

void ltrim(char *str)
{
size_t n;
n = 0;
while (str[n] != '\0' && isspace((unsigned char)str[n])) {
n++;
}
memmove(str, str + n, strlen(str) - n + 1);
}

void trim(char *str)
{
rtrim(str);
ltrim(str);
}
</source>

The [[open-source software|open source]] C++ library [[Boost library|Boost]] has several trim variants, including a standard one:<ref>{{cite web|url=http://www.boost.org/doc/html/string_algo/usage.html#id2742817 |title=Usage - 1.54.0 |publisher=Boost.org |date=2013-05-22 |accessdate=2013-08-24}}</ref>

<source lang="cpp">
#include <boost/algorithm/string/trim.hpp>
trimmed = boost::algorithm::trim_copy("string");
</source>

Note that with boost's function named simply <code>trim</code> the input sequence is modified in-place,<ref>[http://www.boost.org/doc/html/trim.html ]{{dead link|date=August 2013}}</ref> and does not return a result.

Another [[open-source software|open source]] C++ library [[Qt (toolkit)|Qt]] has several trim variants, including a standard one:<ref>http://doc.trolltech.com/4.5/qstring.html#trimmed</ref>

<source lang="cpp-qt">
#include <QString>
trimmed = s.trimmed();
</source>

The [[Linux kernel]] also includes a strip function, <code>strstrip()</code>, since 2.6.18-rc1, which trims the string "in place". Since 2.6.33-rc1, the kernel uses <code>strim()</code> instead of <code>strstrip()</code> to avoid false warnings.<ref>[http://www.kernel.org/pub/linux/kernel/v2.6/snapshots/patch-2.6.33-rc1-git1.log ]{{dead link|date=August 2013}}</ref>

====Haskell====
A trim algorithm in [[Haskell (programming language)|Haskell]]:

<source lang="haskell">
import Data.Char (isSpace)
trim :: String -> String
trim = f . f
where f = reverse . dropWhile isSpace
</source>

may be interpreted as follows: ''f'' drops the preceding whitespace, and reverses the string. ''f'' is then again applied to its own output. Note that the type signature (the second line) is optional.

====J====

The trim algorithm in [[J (programming language)|J]] is a [[functional programming|functional]] description:

<source lang="J">
trim =. #~ [: (+./\ *. +./\.) ' '&~:
</source>

That is: filter (<code>#~</code>) for non-space characters (<code>' '&~:</code>) between leading (<code>+./\</code>) and (<code>*.</code>) trailing (<code>+./\.</code>) spaces.

====JavaScript====
There is a built-in trim function in JavaScript 1.8.1 (Firefox 3.5 and later), and the ECMAScript 5 standard. In earlier versions it can be added to the String object's prototype as follows:

<source lang="javascript">
String.prototype.trim = function() {
return this.replace(/^\s+/g, "").replace(/\s+$/g, "");
};
</source>

====Perl====
Perl 5 has no built-in trim function. However, the functionality is commonly achieved using [[regular expression]]s.

Example:
<source lang="perl">
$string =~ s/^\s+//; # remove leading whitespace
$string =~ s/\s+$//; # remove trailing whitespace
</source>
or:
<source lang="perl">
$string =~ s/^\s+|\s+$//g ; # remove both leading and trailing whitespace
</source>
These examples modify the value of the original variable <code>$string</code>.

Also available for Perl is '''StripLTSpace''' in <code>String::Strip</code> from [[CPAN]].

There are, however, two functions that are commonly used to strip whitespace from the end of strings, <code>chomp</code> and <code>chop</code>:
* <code>[http://perldoc.perl.org/functions/chop.html chop]</code> removes the last character from a string and returns it.
* <code>[http://perldoc.perl.org/functions/chomp.html chomp]</code> removes the trailing newline character(s) from a string if present. (What constitutes a newline is [http://perldoc.perl.org/perlvar.html $INPUT_RECORD_SEPARATOR] dependent).

In [[Perl 6]], the upcoming major revision of the language, strings have a <code>trim</code> method.

Example:
<source lang="perl">
$string = $string.trim; # remove leading and trailing whitespace
$string .= trim; # same thing
</source>

====Tcl====
The [[Tcl]] <code>string</code> command has three relevant subcommands: <code>trim</code>, <code>trimright</code> and <code>trimleft</code>. For each of those commands, an additional argument may be specified: a string that represents a set of characters to remove—the default is whitespace (space, tab, newline, carriage return).

Example of trimming vowels:

<source lang="tcl">
set string onomatopoeia
set trimmed [string trim $string aeiou] ;# result is nomatop
set r_trimmed [string trimright $string aeiou] ;# result is onomatop
set l_trimmed [string trimleft $string aeiou] ;# result is nomatopoeia
</source>

====XSLT====
[[XSL Transformations|XSLT]] includes the function <code>normalize-space(''string'')</code> which strips leading and trailing whitespace, in addition to replacing any whitespace sequence (including line breaks) with a single space.

Example:
<source lang="xml">
<xsl:variable name='trimmed'>
<xsl:value-of select='normalize-space(string)'/>
</xsl:variable>
</source>
XSLT 2.0 includes regular expressions, providing another mechanism to perform string trimming.

Another XSLT technique for trimming is to utilize the XPath 2.0 <code>substring()</code> function.

==See also==
*[[Comparison of programming languages (string functions)]]


==References==
==References==
Line 285: Line 55:
*[http://www.tcl.tk/man/tcl8.4/TclCmd/string.htm#M46 Tcl: string trim]
*[http://www.tcl.tk/man/tcl8.4/TclCmd/string.htm#M46 Tcl: string trim]
*[http://blog.stevenlevithan.com/archives/faster-trim-javascript Faster JavaScript Trim] - compares various JavaScript trim implementations
*[http://blog.stevenlevithan.com/archives/faster-trim-javascript Faster JavaScript Trim] - compares various JavaScript trim implementations
*[http://webwidetutor.com/php/substring/ php string cut and trimming]- php string cut and trimming
*[http://webwidetutor.com/php/PHP-Change-String-value-behaviour-or-look-?id=8 php string cut and trimming]- php string cut and trimming

[[Category:Articles with example code]]
[[Category:Articles with example code]]
[[Category:Programming language comparisons]]
[[Category:String (computer science)]]
[[Category:String (computer science)]]

Latest revision as of 10:58, 3 December 2023

In computer programming, trimming (trim) or stripping (strip) is a string manipulation in which leading and trailing whitespace is removed from a string.

For example, the string (enclosed by apostrophes)

'  this is a test  '

would be changed, after trimming, to

'this is a test'

Variants

[edit]

Left or right trimming

[edit]

The most popular variants of the trim function strip only the beginning or end of the string. Typically named ltrim and rtrim respectively, or in the case of Python: lstrip and rstrip. C# uses TrimStart and TrimEnd, and Common Lisp string-left-trim and string-right-trim. Pascal and Java do not have these variants built-in, although Object Pascal (Delphi) has TrimLeft and TrimRight functions.[1]

Whitespace character list parameterization

[edit]

Many trim functions have an optional parameter to specify a list of characters to trim, instead of the default whitespace characters. For example, PHP and Python allow this optional parameter, while Pascal and Java do not. With Common Lisp's string-trim function, the parameter (called character-bag) is required. The C++ Boost library defines space characters according to locale, as well as offering variants with a predicate parameter (a functor) to select which characters are trimmed.

Special empty string return value

[edit]

An uncommon variant of trim returns a special result if no characters remain after the trim operation. For example, Apache Jakarta's StringUtils has a function called stripToNull which returns null in place of an empty string.

Space normalization

[edit]

Space normalization is a related string manipulation where in addition to removing surrounding whitespace, any sequence of whitespace characters within the string is replaced with a single space. Space normalization is performed by the function named Trim() in spreadsheet applications (including Excel, Calc, Gnumeric, and Google Docs), and by the normalize-space() function in XSLT and XPath,

In-place trimming

[edit]

While most algorithms return a new (trimmed) string, some alter the original string in-place. Notably, the Boost library allows either in-place trimming or a trimmed copy to be returned.

Definition of whitespace

[edit]

The characters which are considered whitespace varies between programming languages and implementations. For example, C traditionally only counts space, tab, line feed, and carriage return characters, while languages which support Unicode typically include all Unicode space characters. Some implementations also include ASCII control codes (non-printing characters) along with whitespace characters.

Java's trim method considers ASCII spaces and control codes as whitespace, contrasting with the Java isWhitespace() method,[2] which recognizes all Unicode space characters.

Delphi's Trim function considers characters U+0000 (NULL) through U+0020 (SPACE) to be whitespace.

Non-space blanks

[edit]

The Braille Patterns Unicode block contains U+2800 BRAILLE PATTERN BLANK, a Braille pattern with no dots raised. The Unicode standard explicitly states that it does not act as a space.

The Non-breaking space U+00A0   NO-BREAK SPACE (&nbsp;, &NonBreakingSpace;) can also be treated as non-space for trimming purposes.

Usage

[edit]

References

[edit]
  1. ^ "Trim". Freepascal.org. 2013-02-02. Retrieved 2013-08-24.
  2. ^ "Character (Java 2 Platform SE 5.0)". Java.sun.com. Retrieved 2013-08-24.
[edit]