FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » Imported messages » comp.lang.php » Extracting multiple matches from a string using preg_replace
Show: Today's Messages :: Polls :: Message Navigator
Switch to threaded view of this topic Create a new topic Submit Reply
Extracting multiple matches from a string using preg_replace [message #178582] Mon, 09 July 2012 08:50 Go to next message
jwcarlton is currently offline  jwcarlton
Messages: 76
Registered: December 2010
Karma: 0
Member
I'm working with a message board database that already has a bunch of YouTube links in the comments, and I'm trying to replace all of the links with a new alternate.

The existing strings are like:

$this_comment = '<a href="" target="_new">http://www.youtube.com/watc...vidid</a><br><br><a href="" target="_new">http://www.youtube.com/watc...vidid_2</a>';

Notice that this string has 2 separate YouTube links.

If you're not familiar, YouTube has several possible link formats, so using parse_url() doesn't really work:

youtube.com/v/{vidid}
youtube.com/vi/{vidid}
youtube.com/?v={vidid}
youtube.com/?vi={vidid}
youtube.com/watch?v={vidid}
youtube.com/watch?vi={vidid}
youtu.be/{vidid}
youtube.com/v/{vidid}?feature=autoshare&version=3&autohide=1&au toplay=1


I've written the regex to find the ID and replace the link correctly, but it only works with the first link that it finds. How do I make it work with all of the matching links in the string?

Here's what I have:

// Fetch the VIDID
$this_id = preg_replace("#.*?<a href=\" http://.*?youtu\.*?be[\.com]*/[watch]*[\?]*(v/|v=|vi/|vi=)*(.*?)[&.+]*\ " target=\"_new\">.*?<\/a>.*#",
"$2", $this_comment);

// I'm not sure why preg_replace isn't catching the extra variables;
// I thought the [&.+]* would do this? Either way, this is a workaround:
list($this_id) = explode("&", $this_id);

// Replace link
if ($this_id) {
$new_link = "Example replacement: $this_id";

$this_comment = preg_replace("#<a href=\"http://.*?youtu\.*?be[\.com]*/[watch]*[\?]*(v/|v=|vi/|vi=)*" . $this_id . "[&.+]*\" target=\"_new\">.*?<\/a>#",
"$new_link", $this_comment);
}
Re: Extracting multiple matches from a string using preg_replace [message #178583 is a reply to message #178582] Mon, 09 July 2012 12:44 Go to previous messageGo to next message
Captain Paralytic is currently offline  Captain Paralytic
Messages: 204
Registered: September 2010
Karma: 0
Senior Member
On Jul 9, 9:50 am, Jason C <jwcarl...@gmail.com> wrote:
> I'm working with a message board database that already has a bunch of YouTube links in the comments, and I'm trying to replace all of the links with a new alternate.
>
> The existing strings are like:
>
> $this_comment = '<a href="" target="_new">http://www.youtube.com/watc...vidid</a><br><br><a href="" target="_new">http://www.youtube.com/watc...vidid_2</a>';
>
> Notice that this string has 2 separate YouTube links.
>
> If you're not familiar, YouTube has several possible link formats, so using parse_url() doesn't really work:
>
> youtube.com/v/{vidid}
> youtube.com/vi/{vidid}
> youtube.com/?v={vidid}
> youtube.com/?vi={vidid}
> youtube.com/watch?v={vidid}
> youtube.com/watch?vi={vidid}
> youtu.be/{vidid}
> youtube.com/v/{vidid}?feature=autoshare&version=3&autohide=1&au toplay=1
>
> I've written the regex to find the ID and replace the link correctly, but it only works with the first link that it finds. How do I make it work with all of the matching links in the string?
>
> Here's what I have:
>
> // Fetch the VIDID
> $this_id = preg_replace("#.*?<a href=\" http://.*?youtu\.*?be[\.com]*/[watch]*[\?]*(v/|v=|vi/|vi=)*(.*?)[&.+]*\ " target=\"_new\">.*?<\/a>.*#",
>   "$2", $this_comment);
>
> // I'm not sure why preg_replace isn't catching the extra variables;
> // I thought the [&.+]* would do this? Either way, this is a workaround:
> list($this_id) = explode("&", $this_id);
>
> // Replace link
> if ($this_id) {
>   $new_link = "Example replacement: $this_id";
>
>   $this_comment = preg_replace("#<a href=\"http://.*?youtu\.*?be[\.com]*/[watch]*[\?]*(v/|v=|vi/|vi=)*" . $this_id . "[&.+]*\" target=\"_new\">.*?<\/a>#",
>     "$new_link", $this_comment);

Well I'm very confused by all this. First of all, why are you using
preg_replace to extract the vidid? I would have thought that a job
better suited to preg_match.

Next, in your string assigned to $this_comment, the first vidid is
different to the other 2, so why are you expecting $this_id to match
all of them?
Re: Extracting multiple matches from a string using preg_replace [message #178584 is a reply to message #178582] Mon, 09 July 2012 13:29 Go to previous messageGo to next message
Peter H. Coffin is currently offline  Peter H. Coffin
Messages: 245
Registered: September 2010
Karma: 0
Senior Member
On Mon, 9 Jul 2012 01:50:39 -0700 (PDT), Jason C wrote:

> I'm working with a message board database that already has a bunch of
> YouTube links in the comments, and I'm trying to replace all of the
> links with a new alternate.
>
> The existing strings are like:
>
> $this_comment = '<a href=""
> target="_new">http://www.youtube.com/watc...vidid</a><br><br><a
> href=""
> target="_new">http://www.youtube.com/watc...vidid_2</a>';
>
> Notice that this string has 2 separate YouTube links.
>
> If you're not familiar, YouTube has several possible link formats, so
> using parse_url() doesn't really work:
>
> youtube.com/v/{vidid}
> youtube.com/vi/{vidid}
> youtube.com/?v={vidid}
> youtube.com/?vi={vidid}
> youtube.com/watch?v={vidid}
> youtube.com/watch?vi={vidid}
> youtu.be/{vidid}
> youtube.com/v/{vidid}?feature=autoshare&version=3&autohide=1&au toplay=1
>
>
> I've written the regex to find the ID and replace the link correctly,
> but it only works with the first link that it finds. How do I make it
> work with all of the matching links in the string?

That's the drawback to using preg_replace() for this. You can't capture
all the bits you want to extract because you *must* enumerate them.
preg_match_all() returns an array of matches, which is what you want if
you don't know how many you're going to get back going in.

> Here's what I have:
>
> // Fetch the VIDID $this_id = preg_replace("#.*?<a
> href=\"http://.*?youtu\.*?be[\.com]*/[watch]*[\?]*(v/|v=|vi/|vi=)*(.*?
> )[&.+]*\" target=\"_new\">.*?<\/a>.*#", "$2", $this_comment);
^^ -- enumerated result
>
> // I'm not sure why preg_replace isn't catching the extra variables;
> // I thought the [&.+]* would do this? Either way, this is a
> workaround: list($this_id) = explode("&", $this_id);

Define "catching" in this context. If you want it back, you need to
paren-tag it so it goes into an enumerated output slot.

> // Replace link if ($this_id) { $new_link = "Example replacement:
> $this_id";
>
> $this_comment = preg_replace("#<a
> href=\"http://.*?youtu\.*?be[\.com]*/[watch]*[\?]*(v/|v=|vi/|vi=)*"
> . $this_id . "[&.+]*\" target=\"_new\">.*?<\/a>#", "$new_link",
> $this_comment); }


--
"'I'm not sleeping with a jr. high schooler! I have a life-sized doll
that looks like one.' Uh huh. That sounds SO much less pathetic."
-- Piro's Conscience www.megatokyo.com
Re: Extracting multiple matches from a string using preg_replace [message #178586 is a reply to message #178583] Mon, 09 July 2012 18:58 Go to previous messageGo to next message
jwcarlton is currently offline  jwcarlton
Messages: 76
Registered: December 2010
Karma: 0
Member
On Monday, July 9, 2012 8:44:06 AM UTC-4, Captain Paralytic wrote:
> Well I'm very confused by all this. First of all, why are you using
> preg_replace to extract the vidid? I would have thought that a job
> better suited to preg_match.

Probably just a lack of knowledge on my part. I thought that preg_match was used to find if the regex was true or false, and then preg_replace would be used to replace whatever.

From Peter's reply, I don't think that either of them are the right command.. But for the sake of my own learning, how would I have modified my script (catching only one) to use preg_match instead of preg_replace?

And, if preg_replace works, then what's the advantage? Speed?


> Next, in your string assigned to $this_comment, the first vidid is
> different to the other 2, so why are you expecting $this_id to match
> all of them?

No, that was the point; they're not going to match, so I need to modify the script to replace ALL of the existing links with the ID that's in that link.

That's why I turned to you guys. My only thought was to put the script in a function, then use a while() loop to keep running the function until there were no more links. I couldn't get it to work, though, and I didn't like the idea of using a loop on it, anyway, so I thought you guys might have a better suggestion.
Re: Extracting multiple matches from a string using preg_replace [message #178587 is a reply to message #178584] Mon, 09 July 2012 19:00 Go to previous messageGo to next message
jwcarlton is currently offline  jwcarlton
Messages: 76
Registered: December 2010
Karma: 0
Member
On Monday, July 9, 2012 9:29:15 AM UTC-4, Peter H. Coffin wrote:
> That's the drawback to using preg_replace() for this. You can't capture
> all the bits you want to extract because you *must* enumerate them.
> preg_match_all() returns an array of matches, which is what you want if
> you don't know how many you're going to get back going in.

Thanks, I'll modify the script this evening with preg_match_all(). :-D


>> // I'm not sure why preg_replace isn't catching the extra variables;
>> // I thought the [&.+]* would do this? Either way, this is a
>> workaround: list($this_id) = explode("&", $this_id);
>
> Define "catching" in this context. If you want it back, you need to
> paren-tag it so it goes into an enumerated output slot.

No, I mean that it's not removing the additional variables. So, this:

&feature=related

Returns this as $this_id:

123456&feature=related

when I just want the "123456".
Re: Extracting multiple matches from a string using preg_replace [message #178590 is a reply to message #178584] Tue, 10 July 2012 09:08 Go to previous message
jwcarlton is currently offline  jwcarlton
Messages: 76
Registered: December 2010
Karma: 0
Member
On Monday, July 9, 2012 9:29:15 AM UTC-4, Peter H. Coffin wrote:
> That&#39;s the drawback to using preg_replace() for this. You can&#39;t capture
> all the bits you want to extract because you *must* enumerate them.
> preg_match_all() returns an array of matches, which is what you want if
> you don&#39;t know how many you&#39;re going to get back going in.

Just a note for anyone else reading this later, preg_match_all() did work perfectly. I changed:

$this_id = preg_match(...);

to this:

preg_match_all("#<a href=\" http://.*?youtu\.*?be[\.com]*/[watch]*[\?]*(v/|v=|vi/|vi=)*(.*?)[&.+]*\ " target=\"_new\">.*?<\/a>#m",
$this_comment, $matches);


This gives me a multidimensional array of $matches, where $matches[2] is the array that holds the values from $2.

So after finding the array, it's a simple matter of putting the second preg_replace() in a foreach loop:

foreach ($matches[2] as $this_id) {
// I'm still not sure why $this_id is keeping the other params
list($this_id) = explode("&", $this_id);

$this_comment = preg_replace(...);
}

Thanks for the help, Peter! If you happen to see the error I'm making with the extra params (forcing me to use explode to get rid of them), I'd appreciate any insight. The workaround is working, though, so it's not a big deal... just sloppy, I guess.
  Switch to threaded view of this topic Create a new topic Submit Reply
Previous Topic: Invitacion a INFOSOFT 2012 [PUCP]
Next Topic: PHP does not flush output on IIS7 ..
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Sat Nov 30 21:21:26 GMT 2024

Total time taken to generate the page: 0.02460 seconds