Show HN: Feedsmith — Fast parser & generator for RSS, Atom, OPML feed namespaces

70 points by macieklamberski 2 months ago

Hi HN! While working on a project that involves frequently parsing a lot of feeds, I needed a fast JavaScript-based parser to extract specific fields from feed namespaces. Existing Node packages were either too slow or merged all feed formats, losing namespace information. So I decided to write it myself and created this NPM package with a simple API.

Feedsmith supports all feed formats and many popular namespaces, including: Podcast, Media, iTunes, Dublin Core, and more. It can also parse and generate OPML files.

I am currently adding support for more namespaces and feed generation for RSS, Atom and RDF. The library grew into something bigger than I initially anticipated, so I also started creating a dedicated documentation website to describe all the features.

DIYgod 2 months ago

Great job! I'm the creator of RSSHub (https://github.com/DIYgod/RSSHub) and Folo (https://github.com/RSSNext/Folo), I previously used rss-parser and encountered some issues, feedsmith has features that interest me, I'll give it a try!

macieklamberski 2 months ago

I was in the same situation — using rss-parser, but eventually faced with some issues.
It also turned out, its performance is not that good. In benchmarks, it's 4-5x slower. In my case, switching to Feedsmith almost doubled the overall parsing speed. This is including the fetching of feeds, which is the main bottleneck.
PS. Great projects, I know and follow both. :)

renegat0x0 2 months ago

Nice project! Good job!

Now somebody might also find interesting what I have done.

- I decided that implementing RSS reader for 100x time is really stupid, so naturally I wrote my own [0]

- my RSS reader is in form of API [1], which I use for crawling

- can be installed via docker. User has to only parse JSON via API. No need to use requests, browsers, status codes

- my weapon of choice is python. There is python feedparser package, but I had problems in using in parallel, because some XML shenanigans, errors

- my reader, serves crawling purpose, so I am interested in most basic elements, like thumbnails, so all nuance from RSS is lost

- detects feeds from sites automatically

Links

[0] https://github.com/rumca-js/crawler-buddy/blob/main/src/webt...

[1] https://github.com/rumca-js/crawler-buddy

piotrkulpinski 2 months ago

Looks great! Do you have any benchmarks comparing the performance with similar packages?

macieklamberski 2 months ago

Thanks! For now I have some benchmarks for parsing, as this has been my main focus regarding performance. It consistently ranks in the top 2 with the caveat that other libs do not support most of the feed namespaces that Feedsmith does.
Here are the results: https://github.com/macieklamberski/feedsmith/blob/main/bench....
- jauntywundrkind 2 months ago
  
  Well done, congrats! Those are great looking results!

urbanisierung 2 months ago

Well done!

vivzkestrel 2 months ago

feedparser in python vs this library, how do they compare?

macieklamberski 2 months ago

Hard to say, as I'm not very familiar with Python. However, from what I understand, JavaScript is generally faster (don't quote me on that).
This is quite an interesting idea — benchmarking feed parsing libraries in different languages. I'll give it some thought.