The following code captures a website for me identically.
<?php
$URL = "http://domain.com/embed.html";
$domain = file_get_contents($URL);
echo $domain;
?>
But what parameters or filters should I add to get only a certain part of a site and be able to remove or replace links and content, block the execution of scripts
*.
I also found an example used jQuery
to edit an external page via iframe
no access denied, etc.
But, I don't understand how it works specifically, what values should be used to remove or replace links and content to block execution of scripts
, can you explain?
$(document).ready(function(){
cleanit = setInterval ( "cleaning()", 500 );
});
function cleaning(){
if($('#frametest').contents().find('.selector').html() == "somthing"){
clearInterval(cleanit);
$('#selector').contents().find('.Link').html('ideate tech');
}
}
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.4.1/jquery.min.js"></script>
<iframe name="frametest" id="frametest" src="example.com" ></iframe>
I have observed in this question new attributes of
HTML5
, that can be done directly in aiframe
but that way I don't want it in certain parts everything freezes without showing me anything.
For what you are trying to do, I suggest some regular expressions that can help you, however, for more specific cases that you require, you must identify which regular expression suits your needs:
Once the web content is obtained with:
In order to remove certain tags you can apply ( https://stackoverflow.com/questions/1886740/php-remove-javascript ):
If you want to replace some specific text you can use regular expressions or the php str_replace function applied to the $domain variable ( http://php.net/manual/es/function.str-replace.php )
And to be able to replace links ( https://stackoverflow.com/questions/14573553/php-file-get-contents-replace-all-urls-in-all-a-href-links )
I hope I can serve you.
If I have understood it correctly, you try to display a (part or modification) of a web page within another of yours.
To do this, one of the alternatives you propose is to put an iframe and modify the content using javascript.
If your page and the iframe page don't share the same domain, this may never work. The reason is that it would be a huge security hole.
Example: I have a web page at mydomain.com and I add an iframe that points to gmail. If the same origin policy did not exist , I could modify the gmail website in such a way that when a user logs into gmail with the iframe of my website, he could know and save the password used. Or even, even if he didn't save the password, he could read/access the emails once he had logged in.
The restriction also applies in reverse. From an iframe I cannot access data from the parent if it is not in the same domain.
More detail:
http://notasjs.blogspot.com/2013/09/politica-del-mismo-origen-same-origin.html
https://es.wikipedia.org/wiki/Pol%C3%ADtica_del_same_origin
A possible solution to your problem is to use what you have commented about file_get_contents. But keep in mind that:
So you are not retrieving a complete web page, but only the html. So if for example you wanted to retrieve the images, you would have to examine the retrieved html for tags
<img>
(for example), and look for susrc
in order to make a new request and retrieve an image.Also, it is possible that the web page you are trying to retrieve uses ajax to display/modify some of the content. So with file_get_content you would only get the base/initial state of the web.
If the content you are interested in is obtained via ajax, you should inspect the different http requests that the page makes and make a request for each of them.
Also note that once you have retrieved the text from a website, you can do searches and replacements with the preg_replace instruction . preg_replace can use regular expressions for complex searches and replaces.
Although perhaps a better alternative would be the use of an html parser