Show HN: Feedsmith — Fast parser & generator for RSS, Atom, OPML feed namespaces

github.com

63 points by macieklamberski a day ago

Hi HN! While working on a project that involves frequently parsing a lot of feeds, I needed a fast JavaScript-based parser to extract specific fields from feed namespaces. Existing Node packages were either too slow or merged all feed formats, losing namespace information. So I decided to write it myself and created this NPM package with a simple API.

Feedsmith supports all feed formats and many popular namespaces, including: Podcast, Media, iTunes, Dublin Core, and more. It can also parse and generate OPML files.

I am currently adding support for more namespaces and feed generation for RSS, Atom and RDF. The library grew into something bigger than I initially anticipated, so I also started creating a dedicated documentation website to describe all the features.

DIYgod 16 hours ago

Great job! I'm the creator of RSSHub (https://github.com/DIYgod/RSSHub) and Folo (https://github.com/RSSNext/Folo), I previously used rss-parser and encountered some issues, feedsmith has features that interest me, I'll give it a try!

  • macieklamberski 9 hours ago

    I was in the same situation — using rss-parser, but eventually faced with some issues.

    It also turned out, its performance is not that good. In benchmarks, it's 4-5x slower. In my case, switching to Feedsmith almost doubled the overall parsing speed. This is including the fetching of feeds, which is the main bottleneck.

    PS. Great projects, I know and follow both. :)

renegat0x0 a day ago

Nice project! Good job!

Now somebody might also find interesting what I have done.

- I decided that implementing RSS reader for 100x time is really stupid, so naturally I wrote my own [0]

- my RSS reader is in form of API [1], which I use for crawling

- can be installed via docker. User has to only parse JSON via API. No need to use requests, browsers, status codes

- my weapon of choice is python. There is python feedparser package, but I had problems in using in parallel, because some XML shenanigans, errors

- my reader, serves crawling purpose, so I am interested in most basic elements, like thumbnails, so all nuance from RSS is lost

- detects feeds from sites automatically

Links

[0] https://github.com/rumca-js/crawler-buddy/blob/main/src/webt...

[1] https://github.com/rumca-js/crawler-buddy

piotrkulpinski a day ago

Looks great! Do you have any benchmarks comparing the performance with similar packages?

vivzkestrel 15 hours ago

feedparser in python vs this library, how do they compare?

  • macieklamberski 10 hours ago

    Hard to say, as I'm not very familiar with Python. However, from what I understand, JavaScript is generally faster (don't quote me on that).

    This is quite an interesting idea — benchmarking feed parsing libraries in different languages. I'll give it some thought.