Re: reduce all spaces to one [message #176909 is a reply to message #176908] |
Sat, 04 February 2012 23:05 |
Thomas 'PointedEars'
Messages: 701 Registered: October 2010
Karma:
|
Senior Member |
|
|
John wrote:
> Am 04.02.2012 15:41, schrieb Thomas 'PointedEars' Lahn:
^^^^^^^
>> It should be noted that if your question was understood literally, the
>> solutions presented so far would be wrong. \s would match too many
>> different characters, as it stands for *white-space* in PCRE, _not_ only
>> the space character. In order to reduce only all consecutive *space*
>> characters to one space character, you need to write
>>
>> echo preg_replace('/ +/', ' ', "here are some \n spaces ");
>>
>> Note that the newline, which is white-space too, is preserved here.
>
> in fact, just by 'accident' the 'regex' solutions fits perfectly in what
> I have to do next, i.e. feed the text through an 'explode' statement,
> which takes ' ' as separator.
I do not think you have fully understood what I have said (maybe you want to
try posting to *de*.comp.lang.php instead?). *All* presented solutions so
far, including yours and mine, are "'regex' solutions". But the set of
characters that are replaced differs between them.
Anyhow, you do not need to replace spaces before word splitting. Instead of
PHP 5+ [1]
$words = str_split(' ', preg_replace('/\\s+/', ' ', $text));
since PHP 4 you can just split at consecutive white-space [2]:
$words = preg_split('/\\s+/', $text);
Also note that for finding the *words* in a text, this splitting at white-
space is _not_ sufficient. For example, in the sentence before splitting at
white-space would have resulted in words "text," "_not_", and "sufficient.".
Once you use preg_split(), though, there is a simple remedy (which can be
found at [2]). Just include all characters that you do not want to be part
of a word in the character class:
$words = preg_split('/[\\s,]+/', $text);
(I prefer to write `\\' instead of `\' even within single-quoted strings, in
order to make the `\' explicit. YMMV.)
You might also want to exclude periods (`.') and other punctuation from
words, like dashes and ellipses. Note that PCRE provides an escape sequence
for ASCII-non-word characters [3], and that it supports Unicode character
properties, which can be used to differentiate between letters and non-
letters in various natural languages [4].
You might find [5] useful, in particular [6], for the next time that you
post (translations are available there).
PointedEars
___________
[1] <http://php.net/str_split>
[2] <http://php.net/preg_split>
[3] <http://php.net/manual/en/regexp.reference.escape.php>
[4] <http://php.net/manual/en/regexp.reference.unicode.php>
[5] <http://catb.org/esr/faqs/smart-questions.html>
[6] <http://catb.org/esr/faqs/smart-questions.html#goal>
--
realism: HTML 4.01 Strict
evangelism: XHTML 1.0 Strict
madness: XHTML 1.1 as application/xhtml+xml
-- Bjoern Hoehrmann
|
|
|