Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-7299

Google follows links on recent activity page and gets too much duplicate content

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.7
    • Fix Version/s: 1.6.4, 1.7, 1.8
    • Component/s: General
    • Labels:
      None
    • Affected Branches:
      MOODLE_17_STABLE
    • Fixed Branches:
      MOODLE_16_STABLE, MOODLE_17_STABLE, MOODLE_18_STABLE

      Description

      I have discovered what may be a bug that causes a Moodle site to be banned by Google, as many of the important pages in my site seem to have been in the last few days. Pages from my forums that once were at the top of certain searches have disappeared entirely from Google's index. Google also dropped the number of times it visited my site this month dramatically.

      In researching this, I learned that Google penalizes sties for lots of duplicate content. How does Moodle make duplicate content? Through the recent activity page. The problem with the recent activity page is that the links to 1 day, 7 days etc. are generated on the fly using unix dates. This means that every time Google's bot follows one of these links it takes it to a page with a new set of unix-date links, which means Google keeps seeing what looks like new links but they really are the same page with the exact same content. I complained about this in the Moodle forums over a year ago because I thought it was a waste of my bandwidth but it seems that the real serious problem is getting banned by Google.

      Now, I looked at what pages are still in their index and found hundreds if not thousands of recent activity pages on my site, which obviously have been building up over time. Now, for an active site like Moodle.org this is probably not a threat because the recent activity pages probably change frequently and are outnumbered by forum posts. But for less active sites, this means that Google will be seeing lots of virtually identical if not identical pages.

      I have put a robots.txt block on this now on my own site and have had to submit a request to Google for reinclusion in their search engine. But I thought I would submit this because this is a serious harm to one's business if one is trying to make money off of one's Moodle site and therefore is a really important issue that should be dealt with in my opinion.

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Fix Release Date:
                7/Nov/06