Oh yes, there is another slight overlook also.
$filetext = "<url>\n";
if ($FUD_OPT_2 & 32768) { // USE_PATH_INFO
$filetext .= "\t<loc>${WWW_ROOT}index.php/t/${thread_id}/</loc>\n";
} else {
$filetext .= "\t<loc>${WWW_ROOT}index.php?t=msg&th=${thread_id}&start=0</loc>\n";
}
Should index.php really be written in clear? Shouldn't it be replaced by ${ROOT} or something? Like below:
$filetext = "<url>\n";
if ($FUD_OPT_2 & 32768) { // USE_PATH_INFO
$filetext .= "\t<loc>${WWW_ROOT}${ROOT}t/${thread_id}/${thread_title_SEO}/</loc>\n";
} else {
$filetext .= "\t<loc>${WWW_ROOT}${ROOT}?t=msg&th=${thread_id}&start=0</loc>\n";
}
With my SEO tweak the whole code looks like this now:
(inner joined msg table to get thread subject so i could mangle all chars away and lowercase it)
note: without tweaks to users.inc.t threads who start with a number will be interpreted as "&start=20" (20=number) and the sitemap link wont work, i fixed this with an is_numeric check in users.inc.t, still would break on a thread where subject actually is a number, but well, I can live with that. - Another fix could be to just start the SEO subject with a -.
PLEASE note that my str_replace code is UGLY and should be corrected by someone that is properly skilled with str_replace or regular expressions. I have no clue about that.
#!/usr/bin/php -q
<?php
/**
* copyright : (C) 2001-2010 Advanced Internet Designs Inc.
* email : forum(at)prohost(dot)org
* $Id$
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License as published by the
* Free Software Foundation; version 2 of the License.
**/
/* Google sitemap settings. */
$frequency = 'weekly';
$priority = '0.5';
$auth_as_user = 0; // User 0 == anonymous.
set_time_limit(0);
ini_set('memory_limit', '128M');
define('forum_debug', 1);
unset($_SERVER['REMOTE_ADDR']);
if (strncmp($_SERVER['argv'][0], '.', 1)) {
require (dirname($_SERVER['argv'][0]) .'/GLOBALS.php');
} else {
require (getcwd() .'/GLOBALS.php');
}
fud_use('err.inc');
fud_use('db.inc');
// Limit topics to what the user has access to.
if ($auth_as_user) {
$join = 'INNER JOIN '. $GLOBALS['DBHOST_TBL_PREFIX'] .'group_cache g1 ON g1.user_id=2147483647 AND g1.resource_id=f.id
LEFT JOIN '. $GLOBALS['DBHOST_TBL_PREFIX'] .'group_cache g2 ON g2.user_id='. $auth_as_user .' AND g2.resource_id=f.id
LEFT JOIN '. $GLOBALS['DBHOST_TBL_PREFIX'] .'mod mm ON mm.forum_id=t.forum_id AND mm.user_id='. $auth_as_user .' ';
$lmt = '(mm.id IS NOT NULL OR (COALESCE(g2.group_cache_opt, g1.group_cache_opt) & 2) > 0)';
} else {
$join = 'INNER JOIN '. $GLOBALS['DBHOST_TBL_PREFIX'] .'group_cache g1 ON g1.user_id=0 AND g1.resource_id=t.forum_id ';
$lmt = '(g1.group_cache_opt & 2) > 0';
}
$c = uq('SELECT t.id, t.last_post_date, t.root_msg_id, m.id, m.subject FROM '. $GLOBALS['DBHOST_TBL_PREFIX'] .'thread t '. $join .'
inner join '. $GLOBALS['DBHOST_TBL_PREFIX'] .'msg m ON t.root_msg_id = m.id
WHERE '. $lmt .' ORDER BY t.last_post_date DESC LIMIT 50000');
echo "Writing sitemap.xml file to ${GLOBALS['WWW_ROOT_DISK']}\n";
$fh = fopen($GLOBALS['WWW_ROOT_DISK'].'/sitemap.xml', 'w');
$xmlhead = <<<EOF
<?xml version='1.0' encoding='UTF-8'?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">\n
EOF;
fwrite($fh, $xmlhead);
while ($r = db_rowarr($c)) {
$thread_id = $r[0];
// $post_stamp = date('H:i:s', $r[1]) .'T'. date('Y-m-d', $r[1]);
$post_stamp = date('H:i:s\TY-m-d', $r[1]);
$thread_title_SEO = str_replace(" ","-",$r[4]);
$thread_title_SEO = strtolower($thread_title_SEO);
$thread_title_SEO = preg_replace('/[^a-z0-9_]/i', '-', $thread_title_SEO);
$thread_title_SEO = preg_replace('/_[_]*/i', '-', $thread_title_SEO);
$thread_title_SEO = str_replace('---', '-', $thread_title_SEO);
$thread_title_SEO = str_replace('--', '-', $thread_title_SEO);
$thread_title_SEO = str_replace('-s-', 's-', $thread_title_SEO);
$thread_title_SEO = str_replace("%","",$thread_title_SEO);
$filetext = "<url>\n";
if ($FUD_OPT_2 & 32768) { // USE_PATH_INFO
$filetext .= "\t<loc>${WWW_ROOT}${ROOT}t/${thread_id}/${thread_title_SEO}/</loc>\n";
} else {
$filetext .= "\t<loc>${WWW_ROOT}${ROOT}?t=msg&th=${thread_id}&start=0</loc>\n";
}
$filetext .= "\t<lastmod>${post_stamp}+00:00</lastmod>\n";
$filetext .= "\t<changefreq>$frequency</changefreq>\n";
$filetext .= "\t<priority>$priority</priority>\n";
$filetext .= "</url>\n";
fwrite($fh, $filetext);
}
fwrite($fh, "</urlset>\n");
fclose($fh);
$google = 'www.google.com';
echo "Notify $google...";
if($fp = @fsockopen($google, 80)) {
$req = "GET /webmasters/sitemaps/ping?sitemap=". urlencode($GLOBALS['WWW_ROOT'].'sitemap.xml') ." HTTP/1.1\r\n".
"Host: $google\r\n".
"User-Agent: FUDforum $FORUM_VERSION\r\n".
"Connection: Close\r\n\r\n";
fwrite($fp, $req);
while(!feof($fp)) {
if( @preg_match('~^HTTP/\d\.\d (\d+)~i', fgets($fp, 128), $m) ) {
echo ' status: '. intval($m[1]) ."\n";
break;
}
}
fclose($fp);
}
echo "Done!\n";
?>