next up previous contents index PLPL moodlepserratamodulosperlmonksperldocapuntes LHPgoogleetsiiullpcgull
Sig: Definición de Nombres de Sup: Algunas Extensiones Ant: Modificadores locales Err: Si hallas una errata ...

Subsecciones



Mirando hacia adetrás y hacia adelante

El siguiente fragmento esta 'casi' literalmente tomado de la sección 'Looking-ahead-and-looking-behind' en perlretut:

Las zero-width assertions como caso particular de mirar atrás-adelante

In Perl regular expressions, most regexp elements 'eat up' a certain amount of string when they match. For instance, the regexp element [abc}] eats up one character of the string when it matches, in the sense that Perl moves to the next character position in the string after the match. There are some elements, however, that don't eat up characters (advance the character position) if they match.

The examples we have seen so far are the anchors. The anchor ^ matches the beginning of the line, but doesn't eat any characters.

Similarly, the word boundary anchor \b matches wherever a character matching \w is next to a character that doesn't, but it doesn't eat up any characters itself.

Anchors are examples of zero-width assertions. Zero-width, because they consume no characters, and assertions, because they test some property of the string.

In the context of our walk in the woods analogy to regexp matching, most regexp elements move us along a trail, but anchors have us stop a moment and check our surroundings. If the local environment checks out, we can proceed forward. But if the local environment doesn't satisfy us, we must backtrack.

Checking the environment entails either looking ahead on the trail, looking behind, or both.

The lookahead and lookbehind assertions are generalizations of the anchor concept. Lookahead and lookbehind are zero-width assertions that let us specify which characters we want to test for.

Lookahead assertion

The lookahead assertion is denoted by (?=regexp) and the lookbehind assertion is denoted by (?<=fixed-regexp).

En español, operador de ``trailing'' o ``mirar-adelante'' positivo. Por ejemplo, /\w+(?=\t)/ solo casa una palabra si va seguida de un tabulador, pero el tabulador no formará parte de $&. Ejemplo:

> cat -n lookahead.pl
    1 #!/usr/bin/perl
    2 
    3  $a = "bugs the rabbit";
    4  $b = "bugs the frog";
    5  if ($a =~ m{bugs(?= the cat| the rabbit)}i) { print "$a matches. $& = $&\n"; }
    6  else { print "$a does not match\n"; }
    7  if ($b =~ m{bugs(?= the cat| the rabbit)}i) { print "$b matches. $& = $&\n"; }
    8  else { print "$b does not match\n"; }
Al ejecutar el programa obtenemos:
> lookahead.pl
bugs the rabbit matches. $& = bugs
bugs the frog does not match
>

Some examples using the debugger3.4:

  DB<1>       #012345678901234567890
  DB<2>  $x = "I catch the housecat 'Tom-cat' with catnip"
  DB<3>  print "($&) (".pos($x).")\n" if $x  =~ /cat(?=\s)/g
(cat) (20)                    # matches 'cat' in 'housecat'

  DB<5>  $x = "I catch the housecat 'Tom-cat' with catnip" # To reset pos
  DB<6>  x @catwords = ($x =~ /(?<=\s)cat\w+/g)
0  'catch'
1  'catnip'

  DB<7>       #012345678901234567890123456789
  DB<8>  $x = "I catch the housecat 'Tom-cat' with catnip"
  DB<9>  print "($&) (".pos($x).")\n" if $x =~ /\bcat\b/g
(cat) (29) # matches 'cat' in 'Tom-cat'

  DB<10>  $x = "I catch the housecat 'Tom-cat' with catnip"
  DB<11>  x  $x =~ /(?<=\s)cat(?=\s)/
  empty array
  DB<12>  # doesn't match; no isolated 'cat' in middle of $x

A hard RegEx problem

Véase el nodo A hard RegEx problem en PerlMonks. Un monje solicita:

Hi Monks,

I wanna to match this issues:

  1. The string length is between 3 and 10
  2. The string ONLY contains [0-9] or [a-z] or [A-Z], but
  3. The string must contain a number AND a letter at least

Pls help me check. Thanks

Solución:

casiano@millo:~$ perl -wde 0
main::(-e:1):   0
  DB<1> x 'aaa2a1' =~  /\A(?=.*[a-z])(?=.*\d)\w{3,10}\z/i
0  1
  DB<2> x 'aaaaaa' =~  /\A(?=.*[a-z])(?=.*\d)\w{3,10}\z/i
  empty array
  DB<3> x '1111111' =~  /\A(?=.*[a-z])(?=.*\d)\w{3,10}\z/i
  empty array
  DB<4> x '1111111bbbbb' =~  /\A(?=.*[a-z])(?=.*\d)\w{3,10}\z/i
  empty array
  DB<5> x '111bbbbb' =~  /\A(?=.*[a-z])(?=.*\d)\w{3,10}\z/i
0  1

Los paréntesis looakehaed and lookbehind no capturan

Note that the parentheses in (?=regexp) and (?<=regexp) are non-capturing, since these are zero-width assertions.

Limitaciones del lookbehind

Lookahead (?=regexp) can match arbitrary regexps, but lookbehind (?<=fixed-regexp) only works for regexps of fixed width, i.e., a fixed number of characters long.

Thus (?<=(ab|bc)) is fine, but (?<=(ab)*) is not.

Negación de los operadores de lookahead y lookbehind

The negated versions of the lookahead and lookbehind assertions are denoted by (?!regexp) and (?<!fixed-regexp) respectively. They evaluate true if the regexps do not match:

    $x = "foobar";
    $x =~ /foo(?!bar)/;  # doesn't match, 'bar' follows 'foo'
    $x =~ /foo(?!baz)/;  # matches, 'baz' doesn't follow 'foo'
    $x =~ /(?<!\s)foo/;  # matches, there is no \s before 'foo'

Ejemplo: split con lookahead y lookbehind

Here is an example where a string containing blank-separated words, numbers and single dashes is to be split into its components.

Using /\s+/ alone won't work, because spaces are not required between dashes, or a word or a dash. Additional places for a split are established by looking ahead and behind:

casiano@tonga:~$ perl5.10.1 -wdE 0
main::(-e:1):   0
  DB<1> $str = "one two - --6-8"
  DB<2> x @toks = split / \s+ | (?<=\S) (?=-) | (?<=-)  (?=\S)/x, $str
0  'one'
1  'two'
2  '-'
3  '-'
4  '-'
5  6
6  '-'
7  8

Look Around en perlre

El siguiente párrafo ha sido extraído la sección 'Look-Around-Assertions' en pelre. Usémoslo como texto de repaso:

Look-around assertions are zero width patterns which match a specific pattern without including it in $&. Positive assertions match when their subpattern matches, negative assertions match when their subpattern fails. Look-behind matches text up to the current match position, look-ahead matches text following the current match position.

Veamos un ejemplo de uso. Se quiere sustituir las extensiones .something por .txt en cadenas que contienen una ruta a un fichero:

casiano@millo:~$ perl5.10.1 -wdE 0
main::(-e:1):   0
  DB<1> ($b = $a = 'abc/xyz.something') =~ s{\.[^.]*$}{.txt}
  DB<2> p $b
abc/xyz.txt
  DB<3> ($b = $a = 'abc/xyz.something') =~ s/\.\K[^.]*$/txt/;
  DB<4> p $b
abc/xyz.txt
  DB<5> p $a
abc/xyz.something

Véase también:

Operador de predicción negativo: Última ocurrencia

Escriba una expresión regular que encuentre la última aparición de la cadena foo en una cadena dada.

  DB<6> x ($a = 'foo foo bar bar foo bar bar') =~ /foo(?!.*foo)/g; print pos($a)."\n"
19
  DB<7> x ($a = 'foo foo bar bar foo bar bar') =~ s/foo(?!.*foo)/\U$&/
0  1
  DB<8> x $a
0  'foo foo bar bar FOO bar bar'

Diferencias entre mirar adelante negativo y mirar adelante con clase negada

Aparentemente el operador ``mirar-adelante'' negativo es parecido a usar el operador ``mirar-adelante'' positivo con la negación de una clase.

/regexp(?![abc])/
/regexp(?=[^abc])/

Sin embargo existen al menos dos diferencias:

AND y AND NOT

Otros dos ejemplos:

Lookahead negativo versus lookbehind

Nótese que el ``mirar-adelante'' negativo no puede usarse fácilmente para imitar un ``mirar-atrás'', esto es, que no se puede imitar la conducta de (?<!foo)bar mediante algo como (/?!foo)bar. Tenga en cuenta que:

Ejercicios

Ejercicio 3.2.2  


next up previous contents index PLPL moodlepserratamodulosperlmonksperldocapuntes LHPgoogleetsiiullpcgull
Sig: Definición de Nombres de Sup: Algunas Extensiones Ant: Modificadores locales Err: Si hallas una errata ...
Casiano Rodríguez León
2013-03-05