Expresiones Condicionales

Sig: Verbos que controlan el Sup: Algunas Extensiones Ant: Expresiones Regulares en tiempo Err: Si hallas una errata ...

Subsecciones

Expresiones Condicionales

Citando a perlre:

A conditional expression is a form of if-then-else statement that allows one to choose which patterns are to be matched, based on some condition.

There are two types of conditional expression: (?(condition)yes-regexp) and (?(condition)yes-regexp|no-regexp).

(?(condition)yes-regexp) is like an if () {} statement in Perl. If the condition is true, the yes-regexp will be matched. If the condition is false, the yes-regexp will be skipped and Perl will move onto the next regexp element.

The second form is like an if () {} else {} statement in Perl. If the condition is true, the yes-regexp will be matched, otherwise the no-regexp will be matched.

The condition can have several forms.

The first form is simply an integer in parentheses (integer). It is true if the corresponding backreference \integer matched earlier in the regexp. The same thing can be done with a name associated with a capture buffer, written as (<name>) or ('name').
The second form is a bare zero width assertion (?...), either a lookahead, a lookbehind, or a code assertion.
The third set of forms provides tests that return true if the expression is executed within a recursion (R) or is being called from some capturing group, referenced either by number (R1, or by name (R&name).

Condiciones: número de paréntesis

Una expresión condicional puede adoptar diversas formas. La mas simple es un entero en paréntesis. Es cierta si la correspondiente referencia \integer casó (también se puede usar un nombre si se trata de un paréntesis con nombre).

En la expresión regular /^(.)(..)?(?(2)a|b)/ si el segundo paréntesis casa, la cadena debe ir seguida de una a, si no casa deberá ir seguida de una b:

  DB<1> x 'hola' =~ /^(.)(..)?(?(2)a|b)/
0  'h'
1  'ol'
  DB<2> x 'ha' =~ /^(.)(..)?(?(2)a|b)/
  empty array
  DB<3> x 'hb' =~ /^(.)(..)?(?(2)a|b)/
0  'h'
1  undef

Ejemplo: cadenas de la forma una-otra-otra-una

La siguiente búsqueda casa con patrones de la forma $x$x o $x$y$y$x:

pl@nereida:~/Lperltesting$ perl5.10.1 -wde 0
main::(-e:1):   0
  DB<1> x 'aa' =~ m{^(\w+)(\w+)?(?(2)\2\1|\1)$}
0  'a'
1  undef
  DB<2> x 'abba' =~ m{^(\w+)(\w+)?(?(2)\2\1|\1)$}
0  'a'
1  'b'
  DB<3> x 'abbc' =~ m{^(\w+)(\w+)?(?(2)\2\1|\1)$}
  empty array
  DB<4> x 'juanpedropedrojuan' =~ m{^(\w+)(\w+)?(?(2)\2\1|\1)$}
0  'juan'
1  'pedro'

Condiciones: Código

Una expresión condicional también puede ser un código:

  DB<1> $a = 0; print "$&" if 'hola' =~ m{(?(?{$a})hola|adios)} # No hay matching

  DB<2> $a = 1; print "$&" if 'hola' =~ m{(?(?{$a})hola|adios)}
hola

Ejemplo: Cadenas con posible paréntesis inicial (no anidados)

La siguiente expresión regular utiliza un condicional para forzar a que si una cadena comienza por un paréntesis abrir termina con un paréntesis cerrar. Si la cadena no comienza por paréntesis abrir no debe existir un paréntesis final de cierre:

pl@nereida:~/Lperltesting$ cat -n conditionalregexp.pl
    1   #!/usr/local/lib/perl/5.10.1/bin//perl5.10.1 -w
    2   use v5.10;
    3   use strict;
    4 
    5   my $r = qr{(?x)                # ignore spaces
    6               ^
    7               ( \( )?            # may be it comes an open par
    8               [^()]+             # no parenthesis
    9               (?(1)              # did we sart with par?
   10                 \)               # if yes then close par
   11               )
   12               $
   13             };
   14   say "<$&>" if '(abcd)' =~ $r;
   15   say "<$&>" if 'abc' =~ $r;
   16   say "<(abc> does not match" unless '(abc' =~ $r;
   17   say "<abc)> does not match" unless 'abc)' =~ $r;

Al ejecutar este programa se obtiene:

pl@nereida:~/Lperltesting$ ./conditionalregexp.pl
<(abcd)>
<abc>
<(abc> does not match
<abc)> does not match

Expresiones Condicionales con `(R)`

El siguiente ejemplo muestra el uso de la condición (R), la cual comprueba si la expresión ha sido evaluada dentro de una recursión:

pl@nereida:~/Lperltesting$ perl5.10.1 -wdE 0
main::(-e:1):   0
  DB<1> x 'bbaaaabb' =~ /(b(?(R)a+|(?0))b)/
0  'bbaaaabb'
  DB<2> x 'bb' =~ /(b(?(R)a+|(?0))b)/
  empty array
  DB<3> x 'bab' =~ /(b(?(R)a+|(?0))b)/
  empty array
  DB<4> x 'bbabb' =~ /(b(?(R)a+|(?0))b)/
0  'bbabb'

La sub-expresión regular (?(R)a+|(?0)) dice: si esta siendo evaluada recursivamente admite a+ si no, evalúa la regexp completa recursivamente.

Ejemplo: Palíndromos con Equivalencia de Acentos Españoles

Se trata en este ejercicio de generalizar la expresión regular introducida en la sección 3.2.5 para reconocer los palabra-palíndromos^3.7. Se trata de encontrar una regexp que acepte que la lectura derecha e inversa de una frase en Español pueda diferir en la acentuación (como es el caso del clásico palíndromo dábale arroz a la zorra el abad). Una solución trivial es preprocesar la cadena eliminando los acentos. Supondremos sin embargo que se quiere trabajar sobre la cadena original. He aquí una solucion parcial (por consideraciones de legibilidad sólo se consideran las vocales a y o:

    1 pl@nereida:~/Lperltesting$ cat spanishpalin.pl
    2 #!/usr/local/lib/perl/5.10.1/bin//perl5.10.1 -w -CIOEioA
    3 use v5.10;
    4 use strict;
    5 use utf8;
    6 
    7 my $regexp = qr/^(?<pal>\W* (?: 
    8                             (?<L>(?<a>[áa])|(?<e>[ée])|\w) # letter
    9                             (?&pal)                        # nested palindrome
   10                             (?(<a>)[áa]                    # if is an "a" group
   11                                   |(?:((?<e>)[ée]          # if is an "e" group
   12                                             |\g{L}         # exact match
   13                                       )                    # end if [ée]
   14                                    )                       # end group
   15                             )                              # end if [áa]
   16                           | \w?                            # non rec. case
   17                       ) \W*                                # punctuation symbols
   18                   )
   19                 $
   20                /ix;
   21 
   22 my $input = <>; # Try: 'dábale arroz a la zorra el abad';
   23 chomp($input);
   24 if ($input =~ $regexp) {
   25   say "$input is a palindrome";
   26 }
   27 else {
   28   say "$input does not match";
   29 }

Ejecución:

pl@nereida:~/Lperltesting$ ./spanishpalin.pl
dábale arroz a la zorra el abad
dábale arroz a la zorra el abad is a palindrome
pl@nereida:~/Lperltesting$ ./spanishpalin.pl
óuuo
óuuo does not match
pl@nereida:~/Lperltesting$ ./spanishpalin.pl
éaáe
éaáe is a palindrome

Hemos usado la opción -CIOEioA para asegurarnos que los ficheros de entrada/saldia y error y la línea de comandos estan en modo UTF-8. (Véase la sección )

Esto es lo que dice la documentación de perlrun al respecto:

The -C flag controls some of the Perl Unicode features.

As of 5.8.1, the -C can be followed either by a number or a list of option letters. The letters, their numeric values, and effects are as follows; listing the letters is equal to summing the numbers.

  1   I 1 STDIN is assumed to be in UTF-8
  2   O 2 STDOUT will be in UTF-8
  3   E 4 STDERR will be in UTF-8
  4   S 7 I + O + E
  5   i 8 UTF-8 is the default PerlIO layer for input streams
  6   o 16 UTF-8 is the default PerlIO layer for output streams
  7   D 24 i + o
  8   A 32 the @ARGV elements are expected to be strings encoded
  9   in UTF-8
 10   L 64 normally the "IOEioA" are unconditional,
 11   the L makes them conditional on the locale environment
 12   variables (the LC_ALL, LC_TYPE, and LANG, in the order
 13   of decreasing precedence) -- if the variables indicate
 14   UTF-8, then the selected "IOEioA" are in effect
 15   a 256 Set ${^UTF8CACHE} to -1, to run the UTF-8 caching code in
 16   debugging mode.

For example, -COE and -C6 will both turn on UTF-8-ness on both STDOUT and STDERR. Repeating letters is just redundant, not cumulative nor toggling.

The io options mean that any subsequent open() (or similar I/O operations) will have the :utf8 PerlIO layer implicitly applied to them, in other words, UTF-8 is expected from any input stream, and UTF-8 is produced to any output stream. This is just the default, with explicit layers in open() and with binmode() one can manipulate streams as usual.

-C on its own (not followed by any number or option list), or the empty string "" for the PERL_UNICODE environment variable, has the same effect as -CSDL . In other words, the standard I/O handles and the defaultopen() layer are UTF-8-fied but only if the locale environment variables indicate a UTF-8 locale. This behaviour follows the implicit (and problematic) UTF-8 behaviour of Perl 5.8.0.

You can use -C0 (or 0 for PERL_UNICODE ) to explicitly disable all the above Unicode features.

El pragma use utf8 hace que se utilice una semántica de carácteres (por ejemplo, la regexp /./ casará con un carácter unicode), el pragma use bytes cambia de semántica de caracteres a semántica de bytes (la regexp . casará con un byte).

lhp@nereida:~/Lperl/src/testing$ cat -n dot_utf8_2.pl
     1  #!/usr/local/bin/perl -w
     2  use strict;
     3  use utf8;
     4  use charnames qw{greek};
     5
     6  binmode(STDOUT, ':utf8');
     7
     8  my $x = 'αβγδεφ';
     9
    10  my @w = $x =~ /(.)/g;
    11  print "@w\n";
    12
    13  {
    14    use bytes;
    15    my @v = map { ord } $x =~ /(.)/g;
    16    print "@v\n";
    17  }

Al ejcutar el programa obtenemos la salida:

pl@nereida:~/Lperltesting$ perl dot_utf8_2.pl
α β γ δ ε φ
206 177 206 178 206 179 206 180 206 181 207 134

Sig: Verbos que controlan el Sup: Algunas Extensiones Ant: Expresiones Regulares en tiempo Err: Si hallas una errata ...

Casiano Rodríguez León
2013-03-05

Expresiones Condicionales

Condiciones: número de paréntesis

Ejemplo: cadenas de la forma una-otra-otra-una

Condiciones: Código

Ejemplo: Cadenas con posible paréntesis inicial (no anidados)

Expresiones Condicionales con (R)

Ejemplo: Palíndromos con Equivalencia de Acentos Españoles

Expresiones Condicionales con `(R)`