A guide to RSS feeds
Contents
- 1 Quick Links
- 2 What is RSS?
- 3 Feed content generation methods
- 4 Auto-Generated RSS feeds / URLs
- 5 RSS feed creation services
- 6 Locating URLs for RSS feeds
- 7 A history of RSS
- 8 RSS specifications
- 9 What does RSS XML (your RSS feed code) look like?
- 10 How is RSS used to make a feed?
- 11 Ensure your RSS feeds conform to standards
- 12 What type of information can be communicated using RSS?
- 13 Errors that occur in FeedWind when reading RSS feeds.
- 14 Feed content, layout and structure
- 15 Creating your own RSS XML
- 16 Feed validation
- 17 RSS versions
- 18 Feed auto-discovery
This article is designed to give users an overview and explanation of RSS feeds, how and where they are used, and where they come from.
Quick Links
What is RSS?
RSS feeds are the syndication of online content, provided in a complete or a summary form using XML.
Utilizing HTML tags to create an XML summary of web page content, RSS is specifically designed to cater for syndication of content, often in an excerpted format. For example, RSS can be used to deliver the information on ticker tape feeds which might show the latest stock values, exchange rates or the price of commodities. RSS is also commonly used to deliver blog content or news feeds. Our example below shows a feed from a YouTube channel.
The way this information is ‘fed’ from a web page is via an XML (eXtended Markup Language) document. Similar to HTML, XML uses a tag-based language to create a structured summary of a feed source – typically a web page but can be a database or other data source. An RSS feed typically includes images, video, text and a link to the originating page.
Feed content generation methods
Not all web pages are composed using HTML as there are many other ways to call, manage and deliver content to a web page, including PHP, JavaScript and other languages. For a page containing just HTML, the process of extracting tags to use as XML is relatively simple; where a page contains a variety of code from different languages, things get more complex.
0.Many platforms that use more than simple HTML, such as WordPress, Drupal and Joomla, provide native RSS support. In other words, RSS XML is automatically generated for their pages. Auto-generated RSS is the easiest to use as the feed is already formatted and coded, with a feed URL ready for you to use as you need.
Auto-Generated RSS feeds / URLs
Not all web pages are alike when it comes to the availability of an RSS feed. It depends on the way a site is built/what platform is used to build it as to whether an RSS feed is auto-generated or must be created manually.
A web page manually coded using HTML or other language will not have an RSS feed automatically available. Third party software or some sort of plugin/extension would be required to get an RSS feed from such pages/sites. Dreamweaver is a commonly used site builder and popular with coders, but it does not include RSS functionality in published pages as a standard feature. When using products such as DreamWeaver, an external RSS feed creation service must be used.
RSS feed creation services
Creating a feed which contains the specific content you are aiming at can be a challenge unless there is a regular structure to the content you wish to feed from at the source page. An irregular post/page content structure can create problems when it comes to identifying the correct page elements to include in a feed. Websites such as FetchRSS.com, which can create an RSS feed from a web page still require the data source for the feed to be presented in a regularised format on the page in question. Unless the content is formatted very similarly across the blog or site, the feed will not display correctly. There are very few services which can create an RSS feed for you so it is advisable to choose your CMS carefully if you want syndication of your content.
Fortunately, most modern CMS platforms do have auto-generated RSS feeds. WordPress and Google Sites for example. Both have auto-generated RSS URLs for every post/page. Web builders such as Wix and Weebly have plugins that enable RSS feeds. Some sites restrict RSS feeds to just their blog pages; others will feed content from any URL or datastream such as stock market prices or currency exchange rates.
Locating URLs for RSS feeds
To locate an existing feed (or see if your website has one available), there are browser plugins that can be used to detect whether a page actually has a native RSS feed available.
Copying the page URL into FeedBurner.com will test whether a page has auto-generated RSS or not, but will not give you the exact feed URL to use in a service like FeedWind. You can get a feed URL from FeedBurner but it just adds another server call to the loop so if your feed URL works ok, there is little motivation to get it parsed through FeedBurner.
It is better to use a browser plugin that can detect RSS feeds and allow you to copy the URL. These URLs can be used in FeedWind directly and is a useful tool for those who create a lot of feeds from different sites. Most modern CMS provide an RSS feed automatically when you create a page. Typically, adding /feed/ to the end of a page URL will often be all you need to do. Such a feed URL might be
https://www.example.com/blog/feed/
Some feed URLs are more complex and require parameters such as:
https://www.example.com/blog/feed&version=RSS2_0
A history of RSS
RSS (an acronym for Rich Site Summary or Really Simple Syndication) has been around since 1999 when two software designers (Dan Libby and Ramanathan V. Guha) created a site summary system called RDF (Resource Description Framework). RSS was envisioned as a way to provide a representation of a webpage using a modeling framework that interpreted page content and then categorized it in order for it to be represented elsewhere as a knowledge-base type entity. Simply put it was an automatic information gathering tool.
Libby and Guha then went on to create another version of this software which was since labeled RSS. During the next few years, RSS became more formalized and although these two developers worked for Netscape, whose browser was intended to support RSS/RDF, the situation changed and Netscape had no further inclusion in the world of RSS for the next 8 years.
Despite a continued lack of help/support/motivation from Netscape, a development team (the RSS-DEV Working group and Dave Winer of UserLand software) continued to work on the RSS project. UserLand software was one of the few that natively supported and could create/read RSS. In 2001, RSS 1.0 was ready and used among a small community who had tools or browser functionality that supported RSS. By 2002, they had advanced even further with the RSS specification and released version 2.0.
Over the next 10 years, there was little change to the format. The biggest change is in the amount of software that supports RSS; particularly web browsers. These days RSS is commonplace and feeds are found on millions of web pages and displayed on just as many as an RSS feed widget which can ‘read’ an XML file and translate its instructions into a list of content items.
RSS specifications
There have been a couple of incarnations of RSS. There is no single entity responsible for RSS specifications and updates but a consortium at W3C created the first – RSS V1.0
https://www.w3.org/TR/REC-xml/
Version 2.0 RSS specifications can be found at:
https://cyber.law.harvard.edu/rss/rss.html
These specs/updates are created by consortiums of knowledgeable industry experts and are generally accepted as industry standard specifications.
What does RSS XML (your RSS feed code) look like?
XML (eXtended Markup Language) is very similar to HTML (HyperText Markup Language). A tag-based language, XML (unlike HTML) does not contain styling data. The RSS feed specification also does not include styling information. In essence, RSS feed XML describes the content of the source data. The only formatting is the presence of the title and body sections which separates the title and content into block elements, so the title always appears above the description as we might expect.
An XML RSS feed is broken up into two main elements;
- The Header which contains the title of the overall feed and if applicable, a link to the source. The header section also contains information to define the XML/RSS version used.
- The Content which contains one or more feed entries
Anyone familiar with HTML will find XML easy to follow. The basic structure of the two languages is very similar, both utilizing a tag-based implementation. A typical RSS feed content item would look like this example:
<item>
<title>Breaking Science News</title>
<link>https://news.CNN.com/breaking/</link>
<description>Today the global temperatures reached a new high according to scientists at the University of London’s meteorology department. </description>
<pubdate>13 May 2015 0:00GMT</pubdate>
</item>
As you can see, the above example follows a very similar tag-based coding structure to HTML. The actual number of tags recognized is far greater and can be seen in detail at the RSS specification website at hosted by the Berkman Center at Harvard Law. You can also download the RSS 2.0 specification as a zip file. The specification is also available here.
For those who need to know the inner workings of XML, there is full XML documentation here at W3C.org where you can experiment with XML and get to know how it works.
To display and behave correctly, feeds must comply with the standards set for RSS and its XML; the current version is at RSS 2.0. This standard has not changed much in recent years with some major players pulling out of the RSS market which was seen to be in decline. However, there has been no suitable replacement, leaving RSS as still the most convenient method for syndicating content from a website. Sites such as Facebook and Google Calendar withdrew support for RSS, but instead provide feeds via an API.
The downside is that here at FeedWind, we had to rewrite a lot of our widget code in order to parse the data fed from these APIs. Other sites (e.g Pinterest, Instagram) have also stopped providing RSS feeds and we have yet to provide widget support for these sites but development is ongoing as APIs become available we will be including them into our widget setup.
How is RSS used to make a feed?
An RSS feed is converted into a displayable feed by “parsing” the XML code and producing a readable output. Parsing is a method by which tags are identified and their content directed to the relevant part of our widget display.
The following XML:
<title>Breaking Science News</title>
<description>Today the global temperatures reached a new high according to scientists at the University of London’s meteorology department. </description>
After parsing, would become
Breaking Science News
Today the global temperatures reached a new high according to scientists at the University of London’s meteorology department.
There is no formatting involved (or allowed in RSS feeds) which is where FeedWind comes in. With the Feedwind setup screen you can specify a broad range of styling options for example, text color, font-sizes etc.This is all actioned by a WYSIWYG interface so there is no coding knowledge required, just clicks of your mouse will create the styling you need.
After setting up some styling in FeedWind our example from above could look like this:
Breaking Science News |
Today the global temperatures reached a new high according to scientists at the University of London’s meteorology department. |
To use an RSS feed and convert the XML into a user-friendly readable display you need an RSS reader or other software such as FeedWind (which is a type of RSS reader but delivers the output through a widget rather than displaying a webpage with a list of results and some form of navigation. FeedWind can also do this but can be placed wherever on a web page you like in a fully customizable RSS feed container. The mechanics of how an RSS feed is translated from the original XML is not of significant interest other than to those who wish to code their own RSS feeds.
Ensure your RSS feeds conform to standards
All you need to know really is that your RSS feeds are well-formed and comply with standards. To check if your RSS feeds are compliant with the requisite standards you can use a feed validator. This service allows you to input an RSS feed URL and the service will check to ensure that the XML that makes up the feed is fully-compliant and formed properly. This can be used at the W3C.org RSS feed validator.
Even the best formed feeds such as the RSS 2.0 example from the specification site at Berkman Center, Harvard can come up with errors. If you look at this example, you will see that although the feed validates as RSS 2.0, there are still elements that could cause a feed reader to throw up errors.
What type of information can be communicated using RSS?
As an RSS feed is created using XML which is capable of delivering all kinds of media, so there are few limits as to what can be included in an RSS feed. Basically, if you can display an item on a webpage you can usually feed it using RSS-compliant XML. There will be some instances where content cannot be displayed through an RSS feed such as scripts or program output. This is because a webpage that has programmed functionality (a search box for example) will require local resources such as MySQL databases and local processing power. Authentication is another barrier to using feeds from protected /paid subscription web pages.
A program/script cannot be added to an RSS feed – it would be a security risk at very least – as RSS compliant XML does not include that functionality and for these reasons, a script/program would not be able to run.
So although content can be dynamic (such as an animated .GIF file or a YouTube video) these use resources that are either self-contained, or supplied by the server delivering the feed (e.g. YouTube).
Errors that occur in FeedWind when reading RSS feeds.
There are a number of things that can cause errors when RSS feeds are added to the FeedWind widget, and into other RSS readers and aggregation services. First, let’s have a look at general causes and then at specific examples.
Server Timeouts
Due to the nature of RSS, a successful syndicated feed should be available for update at all times. Without access to the servers that provide these feeds (the hosting server for the originating website), a feed cannot be updated. As an aggregated feed may combine data from multiple websites, the feed reader needs to poll every server periodically to check for updates. FeedWind updates depending on your plan (read more about that here). When the FeedWind crawler checks for updates, the servers providing the data (not the XML itself, but the content that the XML calls) need to be available. If the server providing the content is slow in reacting to a data request, it can cause the FeedWind server to timeout.
Although a healthy 10 second timeout is currently in place at the FeedWind server, things can still go awry. Failure of one feed can cause the all of an aggregated feed to fail too. This is an issue currently being worked on, but a 10 second delay in server response is unusual and even if a failure occurs due to a short-term outage, the feed will resume as normal as soon as the server can be polled successfully again. In plain terms, the feed will only be disrupted if the feed server is too and self-repairs as soon as the originating content server is back online.
Validation Failure
Another major issue with a broken feed is where the feed itself does not validate properly. Sometimes errant HTML on a page can cause a CMS or builder platform to create malformed XML. This can also be intermittent where for example a site is updated regularly and not every post contains such a problematic feed. A misplaced image, incorrect authorship details or simply badly coded HTML can cause a myriad of XML problems and result in a feed which either doesn’t display properly or doesn’t display at all.
Invalid Feed URL Errors
A common error is when users try to enter a URL into the FeedWind setup that is not actually an RSS feed URL but simply the URL of a webpage where they want to get a feed from. This brings us to a completely different situation that requires a little understanding of how RSS feeds are created in the first place.
RSS feeds are not an inherent part of a web page design. To get the XML that forms an RSS feed, some kind of parsing and curation of content must take place. Anyone familiar with popular CMS’s will know that RSS feeds are usually native to the CMS platform so to get a feed from a WordPress page you simply add /rss to the end of the URL. If your site is https://www.example.com, then the RSS feed for that site if it was built using WordPress is simply: https://www.example.com/rss
If you build a site using plain HTML and publish it at https://www.my-html-only-site.com, you will not automatically get an RSS feed URL at https://www.my-html-only-site.com/rss. The only way to get an RSS feed from such a site is to use 3rd party software or services. Google’s FeedBurner is an example of such a service.
Software interaction problems – jQuery and JavaScript
There is a multitude of other interactions that the FeedWind widget may have with website coding. FeedWind’s reliance on JavaScript, and just as with any other web coding language, can lead to issues with IP blocking, JavaScript blocking and many other restrictions that web hosting providers and other networks place on their services.
Usually these are traceable problems and occur within the execution of the page at load time. They can generally be worked around or fixed by version upgrades or applying different approaches. Software such as JQuery can cause issues too but Microsoft’s Internet Explorer series of browsers with their inherent lack of support for commonly used HTML, are also often to blame for styling problems or the complete widget failure.
Code-stripping by CMS and other website building platforms
An unfortunate habit of many CMS editors is that they perform a “clean up” of HTML and CSS styling to remove unclosed HTML tags or duplicated styling tags. Some CMS also “minify” the code to result in fewer tags overall.
The reasoning behind some of the cleaning up activities that happen are a mystery. To make things worse, the code stripped out will be different, depending on which CMS you look at. Code can disappear or be integrated into other code, or have tags inserted or removed. This is presumably done in order to maintain certain functionality or streamline the page for better performance but it can have adverse effects, too.
This cleanup of code upon saving can sometimes have the effect of rendering the FeedWind code non-functional. WordPress for example, can sometimes add [CDATA] entries into the middle of the FeedWind snippet and invalidate the FeedWind script as a result. Other CMS treat JavaScript in different ways, sometimes loading it ahead of other page scripts which slow down page loading times considerably. Software used to ‘minify’ JavaScript can also cause problems.
These are not intrinsic to FeedWind; many other JavaScript and HTML/PHP widgets and programs can exhibit and suffer from exactly the same sort of problems. The complexity of build with regard to many of today’s website means that the problem is just worsened as the time goes by. The resilience of RSS and its XML heritage are a testament to why RSS is still so popular because problems with RSS XML are few and far between. Simplicity is the key and is reflected in the fact that the RSS 2.0 specification is only a few pages long.
Feed content, layout and structure
The feed content in an RSS feed is dependent on the layout of the source site and the way it was designed to deliver RSS. WordPress for example has RSS feed functionality built-in and allows for any content to easily be syndicated via RSS. These feeds are well formed with structured layouts. A feed from other source can be a little less organized and not such a great user experience as the result.
A site built with Dreamweaver for example, would require an RSS feed to be created manually or via third party software, as there is no native support for RSS in Dreamweaver. An RSS feed can be manually created in XML. Generally, most popular CMS have native RSS functionality so either via a plugin, widget or extension, RSS links are automatically available to users – typically through an orange RSS icon.
There are programs and websites available which can create these feeds. The complexity of these RSS feed creators varies as does the quality and consistency of the feeds; Some require knowledge of XML, others like FeedBurner simply go and extract information automatically from a given URL. Dedicated software can take a little time to set up, depending on how complex the feed is. The more tags that are used and the more media types in the content, the more time it takes to create and maintain a feed.
Creating your own RSS XML
To create XML manually requires knowledge of HTML/XML and the RSS specification. This is not a beginners task, although a simple RSS XML file is not too hard to create. Here is an example of a simple feed:
<?xml version=”1.0″ encoding=”utf-8″?>
<rss version=”2.0″>
<channel>
<title>My Example Feed</title>
<link>https://www.example.com/</link>
<description>RSS feed from the example.com blog has news and the latest information about current events.</description>
<item>
<title>Content Title</title>
<link>https://www.example.com/blog/post140644.html</link>
<guid>Unique ID for content. You can use the URL again if necessary, otherwise it must be a unique identifier of your choosing. https://www.example.com/blog/post140644.html</guid>
<pubDate>Wed, 27 Nov 2013 15:17:32 GMT </pubDate>
<description>Today’s football news – Nov 7th 2014</description>
</item>
Writing your own feed XML
It is possible to use an HTML editor to write your own XML code and create a feed yourself. The downside to doing this is that every time a new item of content is added to your blog or website, the XML must be updated to include that new item.
For many blogs and smaller sites this is not a big problem, as the rate at which updates are required is relatively low. However, for a site with a lot of updates, or a situation where there are multiple feeds in the XML, the amount of effort required to update is not justified when compared to the efficiency of an automated feed widget.
Feed validation
Once you have created a feed it is easy to check if it has been made correctly. There are a few validation services on offer where you can enter an RSS feed URL and check its validity. These include:
https://validator.w3.org/feed/
https://www.rssboard.org/rss-validator/
RSS versions
There are five versions of RSS and two versions of Atom
RSS versions: 0.9 – 0.91 – 0.92 – 1.0 – 2.0
Atom versions: 0.3 – 1.0
RSS 2.0 and Atom 1.0 are the most common. FeedWind supports all these versions.
Backwards Compatibility
RSS Version 2.0 is backwards compatible with v1.0, containing the same tags but with some new additional tags available.
Feed auto-discovery
There are times when you may need to find out whether a web page already has an RSS feed enabled. Most feeds are auto-discoverable; in other words, they can be found using software.
Although there are sites online that allow you to check a URL, it is just as easy to enter the URL of the page where you want to see if there is a feed, and check in your browser address bar. If you see the RSS icon appear, a feed is available. This functionality is supported by all the major browsers. Clicking on the icon will give you the feed URL or offer you the option that allows you to ‘subscribe’ to that feed.
The rules relating to feed auto-discovery can be found here:
http://www.rssboard.org/rss-autodiscovery
There are a number of RSS subscription plugins that use this feature to find feeds on a page. FeedBurner also uses auto-discovery to find RSS feeds.
Resources
RSS V1.0 specification https://web.resource.org/rss/1.0/
RSS V2.0 Specification https://cyber.law.harvard.edu/rss/rss.html
RSS advisory Board https://www.rssboard.org
W3C RSS feed Validation service https://validator.w3.org/feed/
RSS feed auto-discovery rules http://www.rssboard.org/rss-autodiscovery