I need a PHP script that will check for inner duplicate content within websites.
I want to be able to pass a URL to the script (through a simple HTML form) and set the time of processing, then the script should do the following:
1. Get all the site's pages.
2. Extract the title, description and the body from each of the site's pages.
3. Compare the results and produce a user-friendly report with the duplicate data found and the URLs in questions.
Please note that if I meet a candidate who can handle this job properly, there will be more projects to come in the future.