Python Strings: S S Len(s) S+
Python Strings: S S Len(s) S+
Python has a built-in string class named "str" with many handy features (there is an
older module named "string" which you should not use). String literals can be
enclosed by either double or single quotes, although single quotes are more
commonly used. Backslash escapes work the usual way within both single and
double quoted literals -- e.g. \n \' \". A double quoted string literal can contain single
quotes without any fuss (e.g. "I didn't do it") and likewise single quoted string can
contain double quotes. A string literal can span multiple lines, but there must be a
backslash \ at the end of each line to escape the newline. String literals inside triple
quotes, """ or ''', can span multiple lines of text.
Python strings are "immutable" which means they cannot be changed after they are
created (Java strings also use this immutable style). Since strings can't be changed,
we construct *new* strings as we go to represent computed values. So for example
the expression ('hello' + 'there') takes in the 2 strings 'hello' and 'there' and builds a
new string 'hellothere'.
Characters in a string can be accessed using the standard [ ] syntax, and like Java
and C++, Python uses zero-based indexing, so if s is 'hello' s[1] is 'e'. If the index is
out of bounds for the string, Python raises an error. The Python style (unlike Perl) is
to halt if it can't tell what to do, rather than just make up a default value. The handy
"slice" syntax (below) also works to extract any substring from a string. The
len(string) function returns the length of a string. The [ ] syntax and the len() function
actually work on any sequence type -- strings, lists, etc.. Python tries to make its
operations work consistently across different types. Python newbie gotcha: don't use
"len" as a variable name to avoid blocking out the len() function. The '+' operator can
concatenate two strings. Notice in the code below that variables are not pre-declared
-- just assign to them and go.
s = 'hi'
print s[1] ## i
print len(s) ## 2
print s + ' there' ## hi there
Unlike Java, the '+' does not automatically convert numbers or other types to string
form. The str() function converts values to a string form so they can be combined
with other strings.
pi = 3.14
##text = 'The value of pi is ' + pi ## NO, does not work
text = 'The value of pi is ' + str(pi) ## yes
For numbers, the standard operators, +, /, * work in the usual way. There is no ++
operator, but +=, -=, etc. work. If you want integer division, it is most correct to use 2
slashes -- e.g. 6 // 5 is 1 (previous to python 3000, a single / does int division with
ints anyway, but moving forward // is the preferred way to indicate that you want int
division.)
The "print" operator prints out one or more python items followed by a newline (leave
a trailing comma at the end of the items to inhibit the newline). A "raw" string literal is
prefixed by an 'r' and passes all the chars through without special treatment of
backslashes, so r'x\nx' evaluates to the length-4 string 'x\nx'. A 'u' prefix allows you
to write a unicode string literal (Python has lots of other unicode support features --
see the docs below).
String Methods
Here are some of the most common string methods. A method is like a function, but
it runs "on" an object. If the variable s is a string, then the code s.lower() runs the
lower() method on that string object and returns the result (this idea of a method
running on an object is one of the basic ideas that make up Object Oriented
Programming, OOP). Here are some of the most common string methods:
A google search for "python str" should lead you to the official python.org string
methods which lists all the str methods.
Python does not have a separate character type. Instead an expression like s[8]
returns a string-length-1 containing the character. With that string-length-1, the
operators ==, <=, ... all work as you would expect, so mostly you don't need to know
that Python does not have a separate scalar "char" type.
String Slices
The "slice" syntax is a handy way to refer to sub-parts of sequences -- typically
strings and lists. The slice s[start:end] is the elements beginning at start and
extending up to but not including end. Suppose we have s = "Hello"
s[1:4] is 'ell' -- chars starting at index 1 and extending up to but not including index 4
s[1:] is 'ello' -- omitting either index defaults to the start or end of the string
s[:] is 'Hello' -- omitting both always gives us a copy of the whole thing (this is the pythonic
way to copy a sequence like a string or list)
s[1:100] is 'ello' -- an index that is too big is truncated down to the string length
The standard zero-based index numbers give easy access to chars near the start of
the string. As an alternative, Python uses negative numbers to give easy access to
the chars at the end o