Forum | Plugin: CP Redirect

You must be logged in to post Login

Plugin: CP Redirect

UserPost

5:56 pm
August 9, 2009


Daniel Bachhuber

Admin

posts 102

I've started working on another plugin, this one focused on redirecting old College Publisher URLs to whatever you've got with WordPress. For the time being, it's called "CP Redirect".

At the moment, I'm trying to figure out which College Publisher URLs it needs to support and how those look and work. Thus far, I've only come across two styles of URLs:

The ones served off the "media.www" subdomain will probably require additional setup of a 301 redirect from that to just the www address. Another idea is that, instead of watching the URL as it comes in, the plugin will just run if WordPress is going to return a 404. I haven't found the hook for that yet, although I imagine there is one.

Are there any College Publisher URL styles I'm missing? Thanks in advance

6:28 pm
August 9, 2009


Andrew Nacin

New Member

posts 1

Post edited 10:34 pm – August 9, 2009 by Andrew Nacin
Post edited 10:53 pm – August 9, 2009 by Andrew Nacin


Hi Daniel,

Thanks for inviting me here. As I've mentioned via e-mail, there are two additional URLs you'll want to trap for:

  1. http://www.dailyorange.com/new…..9205.shtml
  2. http://www.dailyorange.com/hom…..c29ccd1db3

The first URL is (almost) always redirected to the CP4 URL you mention above, with media/storage/paperXXX appended to the start of the path, and the new subdomain level media.

The second URL is not redirected, thus creating a duplicate content issue for SEO purposes.

Because it is (almost) always redirected, the search engines will (almost) never index the first URL. However, they will often index the second URL in addition to the URL you mention.

Additionally, both URLs are widely available to readers and can often be distributed. Catching these two URLs is important not so much for SEO and indexing purposes, but for preventing general link rot.

The CP4 system returns the first URL whenever the article link is requested in its template language. Thus, any URL copied off of a CP4 site home page or section page will be this first URL, not the media.www URL. The CP4 system returns the second URL on a few other index pages, notably author pages. The point is, both are available to readers.

Trapping the first URL is easy. It just requires a more robust regexp to identify it, as you won't have the subdomain or media/storage/paperXXX to rely on. (Of note, you mentioned the 301 redirect to get it off the subdomain. But, but I think this plugin would want to function as a redirect and not a rewrite, so all would be 301's. Otherwise, you won't be indicating that content did indeed move, and you'll have duplicate content.) Trapping the second URL is more difficult, but not impossible.

7:46 pm
August 9, 2009


William P. Davis

Veazie, Maine

Admin

posts 65

There are a few ways to do it. I've set up mod_rewrite to redirect media.www.mainecampus.com and media.mainecampus.com to mainecampus.com, with the following code:

RewriteEngine ON
RewriteRule ^(.*)$ http://mainecampus.com/$1 [R=301,L]

I saved the post IDs from CP when I imported the articles, so this is the best way to redirect for me. I put this in my 404.php page. It takes the id of the original post (in the URL http://media.www.dailyorange.c…..9205.shtml, 3749205 is the ID) and displays a link to the post if it exists.

$better_search = basename($_SERVER['REQUEST_URI'], ".shtml");
$better_search = explode("-", $better_search);
$better_search_id = end($better_search);
$post = get_post($better_search_id);
echo "You may have been looking for:<h1><a href=\"http://mainecampus.com/?p=".$better_search_id."\">".$post->post_title."</a></h1>";

If you didn't save the post ID, you can do it this way:

$search_term = substr($_SERVER['REQUEST_URI'],1);
$search_term = basename($search_term, ".shtml");
$search_term = substr_replace($search_term ,"",-7);
$search_term = str_replace("-", " ", str_replace(".", " ", $search_term));
?>
<br />

<?php query_posts('s='. $search_term ); ?>
<?php if ( have_posts() ) : ?>
<div class="cleardouble"></div>
<div class="content-column left"><h3>You might have been looking for these posts:</h3>

<?php while ( have_posts() ) : the_post(); ?>
    <div class="clear" style="margin: 7px 0 7px 0;"></div>
        <?php if (function_exists('the_thumbs')) {  the_thumbs(); } ?>       
        <small><?php the_time('l, F jS, Y, g:i a') ?>
                <?php if (in_category('News')) { ?> in News
                <?php } elseif (in_category('Opinion')) { ?> in Opinion
                <?php } elseif (in_category('Sports')) { ?> in Sports
                <?php } elseif (in_category('Style &amp; Culture')) { ?> in Style & Culture<?php } ?></small><br />
               
            <h2><a href="<?php the_permalink() ?>" rel="bookmark" title="<?php the_title_attribute(); ?>"><?php the_title(); ?></a></h2>
            <?php if ( get_post_meta($post->ID, 'Subhead', true) ): ?><span class="subhed"><?php echo get_post_meta($post->ID, 'Subhead', true); ?></span><br /><?php endif; ?>
            <? if (in_category('Editorials')) { }else { ?>
            <small>By <?php if ( get_post_meta($post->ID, 'Author', true) ):  echo get_post_meta($post->ID, 'Author', true); else: coauthors_posts_links(); endif; ?></small><br />  <? } ?>
           
            <p><?php the_excerpt(); ?></p>

<?php endwhile; ?></div><?php else : endif; ?>

The second type of redirection is less desirable, since the CP URLs don't display characters such as periods and dashes, and therefore titles like "Rotten.com: The little Web site of appalling horror" displays like this: Rottencom The little etc etc. Therefore, any title with a dash or period does not come up in the search.

You can make a more advanced link with the first method with the following code:

<?php query_posts('p='. $better_search_id ); ?>

Followed by the loop, such as in the second example.

I've found the second type of url (home/index.cfm?uid etc etc etc) represents less than 1 percent of our hits, so, to be honest, I don't give a shit about it.

wpdavis.com | Editor in Chief, The Maine Campus | Associate, CoPress | will@copress.org | 207.660.5342

7:50 pm
August 9, 2009


William P. Davis

Veazie, Maine

Admin

posts 65

P.S. I suggest putting this on your 404 page and not creating redirects, since it will encourage people to update their bookmarks, etc.

wpdavis.com | Editor in Chief, The Maine Campus | Associate, CoPress | will@copress.org | 207.660.5342

8:13 pm
August 9, 2009


William P. Davis

Veazie, Maine

Admin

posts 65

Post edited 12:14 am – August 10, 2009 by William P. Davis
Post edited 12:19 am – August 10, 2009 by William P. Davis
Post edited 12:23 am – August 10, 2009 by William P. Davis


P.P.S. Not all stories are located in the media/storage/paperxxx folder. Some of the URLs are like:

http://mainecampus.com/ news/2005/01/18/Style/Rotten.com.The.Little.Web.Site.Of.Appalling.Horror-834737.shtml

Some of them are like:

http://mainecampus.com/ media/storage/paper322/news/2008/04/17/Style/Let-The.Games.Begin-3332215.shtml

And, of course, the infamous:

http://www.mainecampus.com/ home/index.cfm?event=displayArticle&ustory_id=6e8139f5-656d-4edb-805d-578ee4cdd50a

One of my biggest problems I had with CP was the terrible content structure. To have three different ways to get to the same URL is ridiculous. Also, since there is no consistent structure setting up a blanket redirect is a bad idea, and using the 404 page is better.

wpdavis.com | Editor in Chief, The Maine Campus | Associate, CoPress | will@copress.org | 207.660.5342

8:13 pm
August 11, 2009


Daniel Bachhuber

Admin

posts 102

An update to where I think I'm going with the plugin. After talking it over with Miles, we still think it would be useful to redirect as many URLs as we can safely catch. I think the plugin will still have this as an option that you can turn on. Our approach will be to strip the incoming URL for the article ID, and then try to match that article ID against a post ID. If it exists, then do the redirect. If it doesn't, then go on to the 404 page. The second option for the user will be to drop a function on the 404 page that will probably just mirror the brilliant approach you described.

Using this approach, we'll also be able to build a plugin that's useful to users who have the article ID saved as a custom field as well as those who don't.


About the CoPress forum

Most Users Ever Online:

119


Currently Online:

7 Guests

Forum Stats:

Groups: 1

Forums: 7

Topics: 107

Posts: 538

Membership:

There are 151 Members

There have been 2 Guests

There are 5 Admins

There is 1 Moderator

Top Posters:

Chris Ullyott – 66

Mo Jangda – 35

arobinsonwku – 32

laurenmichell – 21

CMLife – 16

sbressler – 15

Administrators: Daniel Bachhuber (102 Posts), William P. Davis (65 Posts), joey (39 Posts), Greg Linch (14 Posts), adam (1 Post)

Moderators: Andrew Spittle (49 Posts)