Show HN: Feedsmith — Fast parser & generator for RSS, Atom, OPML feed namespaces
github.comHi HN! While working on a project that involves frequently parsing a lot of feeds, I needed a fast JavaScript-based parser to extract specific fields from feed namespaces. Existing Node packages were either too slow or merged all feed formats, losing namespace information. So I decided to write it myself and created this NPM package with a simple API.
Feedsmith supports all feed formats and many popular namespaces, including: Podcast, Media, iTunes, Dublin Core, and more. It can also parse and generate OPML files.
I am currently adding support for more namespaces and feed generation for RSS, Atom and RDF. The library grew into something bigger than I initially anticipated, so I also started creating a dedicated documentation website to describe all the features.
Great job! I'm the creator of RSSHub (https://github.com/DIYgod/RSSHub) and Folo (https://github.com/RSSNext/Folo), I previously used rss-parser and encountered some issues, feedsmith has features that interest me, I'll give it a try!
I was in the same situation — using rss-parser, but eventually faced with some issues.
It also turned out, its performance is not that good. In benchmarks, it's 4-5x slower. In my case, switching to Feedsmith almost doubled the overall parsing speed. This is including the fetching of feeds, which is the main bottleneck.
PS. Great projects, I know and follow both. :)
Nice project! Good job!
Now somebody might also find interesting what I have done.
- I decided that implementing RSS reader for 100x time is really stupid, so naturally I wrote my own [0]
- my RSS reader is in form of API [1], which I use for crawling
- can be installed via docker. User has to only parse JSON via API. No need to use requests, browsers, status codes
- my weapon of choice is python. There is python feedparser package, but I had problems in using in parallel, because some XML shenanigans, errors
- my reader, serves crawling purpose, so I am interested in most basic elements, like thumbnails, so all nuance from RSS is lost
- detects feeds from sites automatically
Links
[0] https://github.com/rumca-js/crawler-buddy/blob/main/src/webt...
[1] https://github.com/rumca-js/crawler-buddy
Looks great! Do you have any benchmarks comparing the performance with similar packages?
Thanks! For now I have some benchmarks for parsing, as this has been my main focus regarding performance. It consistently ranks in the top 2 with the caveat that other libs do not support most of the feed namespaces that Feedsmith does.
Here are the results: https://github.com/macieklamberski/feedsmith/blob/main/bench....
Well done, congrats! Those are great looking results!
Well done!
feedparser in python vs this library, how do they compare?
Hard to say, as I'm not very familiar with Python. However, from what I understand, JavaScript is generally faster (don't quote me on that).
This is quite an interesting idea — benchmarking feed parsing libraries in different languages. I'll give it some thought.