It turns out that I try to extract an image in this way:
$url = 'https://m.fa.com/perfil123';//cualquier perfil
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url );
curl_setopt($ch,CURLOPT_HEADER,0); //visualizar ñ y acentos.
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true );
curl_setopt($ch, CURLOPT_ENCODING, "gzip,deflate"); //(aceptación de codificación gzip)
$url = curl_exec($ch); //almacena el response de la pagina.
curl_close($ch);
preg_match('#class="bo img" src=[^"]*"([^"]*)"#', $url, $datos)
$img = $datos[1];
echo $img;
This is the HTML of the image I'm looking for:
<img width="72" height="72" alt="" class="bo img" src="https://scontent-mia3-2.xx.fbcdn.net/v/t1.0-1/cp0/e15/q65/p74x74/21151613_1725782907724134_7535903357386699205_n.jpg?efg=eyJpIjoiYiJ9&oh=4f22a577f965566b2016ef842f5b110f&oe=5A1DE043">
I'm using the class
to define the image but I don't know where the error is.
With regex (not recommended)
As I told you, the regular expression you are using matches the HTML of your question perfectly ( see demo ). However, using regex for this is not recommended. For example:
<img>
, so with<input type='text' value'class="bo img" src="url.jpg"'>
you would have a problem... and it can be easily solved, but...class="bo img" data-ejemplo="bla" src="url.jpg"
you would have a problem... and it can be easily solved, but...class="bo img"
you would have a problem... and it can be easily solved, but...<!-- <img class="bo img" src="url.jpg"> -->
you would have a problem... and it can be solved, but...It's probably better to modify it to something like:Ver en regex101
but still, it would fail in many cases.
Using DOM (recommended)
You shouldn't use regular expressions to process HTML. At the level you're setting your expression, even a small change to the HTML would cause your regex to fail. An extra space, a change in the tag attributes, a comment, or more complex structures, would make even a gigantic regex not follow the rules. Even with a very advanced expression, you could generate a near-fail-safe case, but you could almost always find a rare case that would cause it to fail. Also, it would require an expert every time you want to modify it.
It is very easy to process HTML with DOM , these are the tools that are designed for that.
If we have an HTML like the following:
Simply generate the DOM like this:
And we can get all the images inside the DOM with:
to go through them with
Obtaining the classes of each with:
and the image URL with:
Code:
Result:
demonstration:
Ejecutar en 3v4l.org