-
Notifications
You must be signed in to change notification settings - Fork 1
URLFinder
The ConvertHelper_URLFinder
class can detect URLs and Email addresses in a string.
use AppUtils\ConvertHelper;
$subject = 'Lorem ipsum dolor https://mistralys.com tempor incididunt https://github.com ullamco laboris.';
$urls = ConvertHelper::createURLFinder($subject)->getURLs();
This will return an indexed array with the following URLs, in the order they are found:
https://mistralys.com
https://github.com
NOTE: Duplicate URLs are trimmed (see "Detecting duplicate URLs").
Use getEmails()
to fetch all Email addresses instead of the regular URLs.
use AppUtils\ConvertHelper;
$subject = 'Lorem ipsum dolor: info@mistralys.com tempor incididunt, webmaster@mistralys.com.';
$urls = ConvertHelper::createURLFinder($subject)->getEmails();
This will return the following email addresses, in the order they were found:
mailto:info@mistralys.com
mailto:webmaster@mistralys.com
Use
enableSorting()
to sort the results alphabetically.
As the previous example shows, all Email addresses are returned with the mailto:
prefix. This can be deactivated with omitMailto()
.
use AppUtils\ConvertHelper;
$subject = 'Lorem ipsum dolor: info@mistralys.com tempor incididunt, webmaster@mistralys.com.';
$urls = ConvertHelper::createURLFinder($subject)
->omitMailto()
->getEmails();
This will return the following email addresses:
info@mistralys.com
webmaster@mistralys.com
By default, you have to use either getURLs()
or getEmails()
to fetch these results separately. With includeEmails()
, getURL()
will return both regular URLs and Email addresses. Sorting then works for both as well.
The resulting list can be sorted alphabetically by calling enableSorting()
:
use AppUtils\ConvertHelper;
$subject = 'Lorem ipsum dolor https://mistralys.com tempor incididunt https://github.com ullamco laboris.';
$urls = ConvertHelper::createURLFinder($subject)
->enableSorting()
->getURLs();
This will return the following URLs, sorted alphabetically:
https://github.com
https://mistralys.com
By default, the finder will recognize duplicates, but only if the URLs are exactly the same. Using a different case in the host, for example, will not detect that https://GitHub.com
is the same as https://github.com
.
The normalizing feature will detect such duplicates. It even detects URLs that have the same query parameters, but in a different order. This feature can be enabled using enableNormalizing()
.
use AppUtils\ConvertHelper;
$subject = '
// Different case
https://GitHub.com
https://github.com
HTTPS://GITHUB.COM
// Same parameters, different order
https://github.com?paramB=bar¶mA=foo
https://github.com?paramA=foo¶mB=bar
';
$urls = ConvertHelper::createURLFinder($subject)
->enableNormalizing()
->getURLs();
This will return the following URLs:
https://github.com
https://github.com?paramA=foo¶mB=bar
Normalizing will work on any combination of case and parameters.
When finding URLs in HTML documents, its is also possible to extract all relative URLs from known tag attributes like href
and src
, by enabling the feature with enableHTMLAttributes()
:
use AppUtils\ConvertHelper;
$html =
'<html>'.
'<head>'.
'<script src="libraries/js/site.js"></script>'.
'<link href="libraries/css/layout.css">'.
'</head>'.
'<body>'.
'<a href="https://github.com">GitHub</a>'.
'</body>'.
'</html>';
$urls = ConvertHelper::createURLFinder($html)
->enableHTMLAttributes()
->getURLs();
This will extract the following URLs from the document:
libraries/js/site.js
libraries/css/layout.css
https://github.com
This will work only in HTML or XML documents.
Use the method getInfos()
to retrieve URLInfo
instances instead of strings. This allows direct access to information on the URLs.
New here?
Have a look at the overview for a list of all helper classes available in the package.
Table of contents
Find the current page in the collapsible "Pages" list above, and expand the page, to view a table of contents.