Show HN: Cleverb.ee – open-source agent that writes a cited research report

67 points by nickwatson 6 months ago

cantaloupe 6 months ago

I browsed the GitHub and website for a bit, but didn’t see any examples! It would be useful to share the output for a common question that can be substantiated with reliable sources, like “Is coffee good for me?”. Even better if you can show a comparison to other deep research tools. From the copy it seems like Cleverbee could pull from more diverse sources (e.g. YouTube) and fewer unreliable sources (e.g. product blogs). Show that off!

nickwatson 6 months ago

Done! https://docs.google.com/document/d/1bGVkI3xaBP1AvRxkB4GKeL0N...
- endianswap 6 months ago
  
  lol what
  > Reference to failed web browser attempt for Rush University
  - nickwatson 6 months ago
    
    Good spot! I already am on the case with that one.
    The Rush University website takes a long time to load (check it out), and the script recognized the article had loaded but something on the website was causing it to hang (waiting for networkidle status) so it terminated the parsing early and worked with the content it had.
    So in a nutshell, it still parsed but with a warning/error that this happened.
    I'm optimizing this now to exclude the wording from the report, or make a note.
  - nickwatson 6 months ago
    
    Fixed, and pushed to the repo.
    If it hits a timeout but the content is still there, it will now return the content (without the error message) to ensure something small like this doesn't get synthesized.
    That's the thing about sytnthesizing and I feel one of the strengths of this system, is the LLM doesn't put too much of it's own "creativity" (which usually comes with hallucination) on to the top of the findings!
    TL;DR: Fixed.
  - NitpickLawyer 6 months ago
    
    > Reference to failed web browser attempt for Rush University
    Leslie, I typed your symptoms into the thing up here, and it says you could have network connectivity problems.
nickwatson 6 months ago

I think that's a very good idea, thanks!
I'm actually working on "The Beehive" at the moment, where the app can push the research to a hive on the website, so people can share their research/discoveries.
My client's paid work takes priority, but I hope to do it over the course of this week.
P.s. Running report now for you, "Is coffee good for me?" to show you this example ;)

nickwatson 6 months ago

Thanks for all the thoughtful feedback today, everyone.

I’m logging the ideas (grounding, source ranking, etc.) and will open issues tonight.

Heading offline now but I’ll circle back tomorrow. Feel free to keep the questions coming!

smallnix 6 months ago

How do you do the citing? Reverse-RAG post processing?

nickwatson 6 months ago

Good question — it's pretty straightforward right now:
I pass the collected content chunks (with their original URLs attached) into Gemini 2.5 Pro, asking it to synthesize a balanced report and to inline citations throughout.
So it's not doing anything fancy like dynamic retrieval or classic RAG architecture.
Basically: - The agent gathers sources (webpages, PDFs, Reddit, etc.) - Summarises each as it goes (using a cheaper model) - Then hands a bundle of summarised + raw content to Gemini 2.5 Pro - Gemini 2.5 Pro writes the final report, embedding links directly as citations with [1], [2], etc style citations throughout.
Reverse-RAG is something I for sure want to implement. Once I can afford a better computer to run this with at scale. Even an 8B model will take overnight to summarize an average piece of content for me right now! But I'm also keeping an eye on the pace of which AI moves in the larger LLM space. The size and abilities of likes of Gemini 2.5 Pro context windows are pretty crazy these days!
Thanks for the question.
- iamandoni 6 months ago
  
  Do you take any measures to prevent link hallucination? And content grounding / attribution verification?
  - nickwatson 6 months ago
    
    At the moment the measures taken are:
    - Full content analysis by Primary LLM (Default is Gemini 2.5 Pro) with link hard-coded alongside each piece of content with structured output for better parsing. - Temperature right down (0.2), strict instructions to synthesize, precise prompts to attribute links exactly and without modification.
    What I hope to introduce:
    - Hard-coded parsing of links mentioned in final report to verify with the link map created throughout the research journey - Optional, "double-checking" LLM review of synthesized content to ensure no drift. - RAG enhancements for token-efficient verification and subsequent user questions (post-research)
    Do you have any further suggestions?
    Right now I hope to strike the delicate balance between token efficiency, with enhanced grounding as optional settings in the future. I have a big task list of things, and this is one of them. I will ensure to re-prioritize alongside user requests for the different features.
    Of course, being open source, contributions are highly welcome. I would love to see large community involvement. Collaboration benefits everyone.
    P.s. I have spent hundreds of dollars in tests. I'd say for every 1 hour of building, about 3 hours of testing have gone into this, debugging, optimizing quality, ensuring guard-rails are in place.
    If you go to the repo, also check out the config/prompts.py file - it will give you a little more insight into what is going on (there are code checks as well, but generally it gives you an idea).

nickwatson 6 months ago

Hi HN

I built *cleverb.ee* to solve my own research pain: too many tabs, too much messy sourcing. Gemini and OpenAI deep research tools weren't getting the balanced/unbiased quality I desired.

*What it does*: • Reads webpages, PDFs, Reddit posts, PubMed abstracts, YouTube transcripts. • Plans with Gemini 2.5 Pro, acts with Gemini 2.5 Flash, summarises with Gemini 2.0 Flash (or you can use any Local LLM or Claude) • Outputs a fact-checked, cited Markdown report with live token usage tracking.

*Tech*: Python + Playwright + LangChain with MCP tool support. AGPL-3.0 licensed.

*Why open source?*: I wanted full transparency at every agent step and easy pluggable data sources.

Quick install:

```bash git clone https://github.com/SureScaleAI/cleverbee cd cleverbee && bash setup.sh

Would love feedback — especially what critical research sources you’d want integrated next!

kleiba 6 months ago

> Gemini and OpenAI deep research tools weren't getting the balanced/unbiased quality I desired.
Could you elaborate, please?
- nickwatson 6 months ago
  
  I felt they would just "cast the net wide" with a quick search-collect at scale, then load in the LLMs own training on to the top, and reports I generated were giving me hallucinated content.
  I wanted something more - collect-evaluate-decide loop to iterate through discoveries and actively seek out diverse sources.
Quanttek 6 months ago

Can you specify ? The default heavy reliance on Reddit and YouTube, rather than trusted publications (e.g. Scientific American, NYTimes) and scientific publications, is worrying given widespread misinformation in certain scientific fields (e.g. nutrition, health, economics)
- nickwatson 6 months ago
  
  I never said "heavy reliance" on Reddit/YouTube. It actually is requested to use discernment to recognize poor, or biased sources and opinions, and label them as such (see the example report on Coffee which I shared previously in another comment).
  Most the time it has only sought out one or two post/youtube videos as it can recognize the low credibility value.
  It comes loaded with a PubMed MCP tool and the beauty of it being open source is you can exclude or limit the sources as much as you want, or add in new sources - that's why I wanted to open it up, to allow for critique over methodologies and allow for improved, balanced research from experts.
  It is also requested to evaluate the source and whether or not they have "some benefit to gain" from the article, to ensure it balances this into the research, also.

devmor 6 months ago

This is not research. This is a search engine.

dackdel 6 months ago

this looks useful!!!!

semi-extrinsic 6 months ago

I think it's really unfortunate that this type of thing gets called "research". I get that it fits with what has unfortunately become modern day usage - "Karen did her own research before becoming a flat-earther" - but I really wish the AI companies would've had better faith in their future solutions than to call this research.

There's gotta be quite a few actual researchers at these companies who are shaking their heads.

To spare others the lookup, here's from the Oxford dictionary. Emphasis on the word "new":

  To study a subject in detail, especially in order to discover new information or reach a new understanding.
 

  Example: They are carrying out/conducting/doing some fascinating research into/on the language of dolphins.

nickwatson 6 months ago

I understand what you're saying. I believe any kind of research will nearly always begin with learning and understanding of the knowledge that is already out there.
Almost every subject has been learned this way, whether at school from a teacher or text-book, or reading papers.
The Oxford dictionary definition says the same, "to study a subject in detail". This is what AI is doing - I see it as a "power suit" for distilling information much faster, without the cognitive bias that many of us will carry.
Learning is an important part of research, and this must come with discernment over credibility of existing research, including identifying where the gaps are. This kind of critical thinking allows for another level, experiments, surveys, etc to uncover things even further.
If you were to study the language of dolphins today, where would you start? Would you jump into the ocean and start trying to talk with them, or would you look up what is already discovered? Would you study their behaviors, patterns, etc?
What drove me to do this project is exactly the example you mentioned, the flat-earther type who look up an article on some kind of free hosting website or Sandra from accounts social media page and taken as the be-all-and-end-all of knowledge. It comes without bias recognition or critical thinking skills. This is where I'm hopeful to level the playing field, and ensure unbiased, balanced information is uncovered.
- latexr 6 months ago
  
  > without the cognitive bias that many of us will carry.
  It is naive and incorrect to believe LLMs do not have biases. Of course they do, they are all trained on biased content. There are plenty of articles on the subject.
  > Would you jump into the ocean and start trying to talk with them, or would you look up what is already discovered?
  Why resort to straw men arguments? Of course anyone would start by looking up what has already been discovered, that doesn’t immediately mean reaching for and blindly trusting any random LLM. The first thing you should do, in fact, is figure out which prior research is important and reliable. There are too many studies out there which are obviously subpar or outright lies.
  - nickwatson 6 months ago
    
    I agree, LLMs have biases. It was my primary desire to build this tool and put the weight on the LLMs to synthesize rather than think about and interpret the subjects. It's actually the main goal of this tool - maybe I don't articulate that as well as I could be - I'm open to suggestions here!
    I agree to first figuring out which research is most important and reliable. There is a planning stage, to consider the sources and which ones hold credibility.
    In addition, the user has full control over the sources the tool uses, and even add their own (MCP tools).
    In addition, being open source, you have full control over the flow/prompts/source methods/etc and as a result can optimize this yourself and even contribute improvements to ensure this benefits research as a whole.
    I welcome your feedback, and any code amendments you propose to improve the tool. You clearly understand what makes good research and your contributions will be highly valued by all of us.
    
    hoc 6 months ago
    
    By having to defend your (thesis)/work like this, the whole piece is getting lifted into academic heights in a way, so you could as well keep calling its result and process research :)
    What description would itself come up with, BTW?
    When you anwer with "I agree, LLMs have biases.", I immediately suspect that to be an LLM calming me after correcting it, though. So, the world has definitely changed and we might need to allow for correcting the broadness of words and meanings.
    After all you did not write thesis, scientific research or similar and I remember it being called researching when people went looking up sources (which took them longer than an agent or LLM these days). Compressing that into a report might make it a review, but anyway. Great that you assembled a useful work tool here for some who need exactly that.
    
    nickwatson 6 months ago
    
    I'm very curious. As it's main purpose is built to really search and decide balanced/unbias sources and not really have an opinion, what the result would be of such a question. I'm curious if it will give an answer on this. I just gave it the request:
    "I am struggling what to call this other than "Deep Research tool" as really it is looking online and scanning/synthesizing sources (that's you, by the way!). With that in mind, someone suggested "literature review" but it makes me think of books. I wonder if you can see what this kind of "research" is and suggest a name to describ it based on all the information you uncover on what good research looks like."
    Let's see how it gets on...
    Also, something I think about a lot (you sound like a deep thinker!) - when we discover something that is untrue, can it make it true? (purely hypothetical thought)... if 1000 people were told coffee was bad for them, does the mind-body connection take over and amplify this into reality. We are certainly in interesting times!
    
    nickwatson 6 months ago
    
    Ha, hoc - it was quite interesting to see, and learn a bit about this.
    Apparently the suggested term is "Digital Information Synthesis"
    You can see the report here: https://docs.google.com/document/d/1Vg_UWUPelWohzVGduaKY7Czd...
    This was quite an interesting use case, thanks!
    
    hoc 6 months ago
    
    Well, it seems it uses the word "research" quite a lot for what it's doing (first paragraph and last one with "deep research") but does choose some artificially constructed one as the result.
    Personally I had similar results when searching for known terms for certain concepts that I didn't know the name for. And usually I had to guide the process to find the actual expression used in the domain (it usually made up a lot of fancy and well-fitted names itself). And sometimes it helped to gonback and change the query. Harder if you have to wait that long, though :)
    So.. I guess I would trust its process (mentioning "research" a lot) more than its chosen result :) Not sure if it you wanted it to be a synthesizer or rather an assistant. I'd go with what is closer to your intention or what you think it resulted in in the end.
    An interesting observation might be that guiding that (I don't dare to say it, but anyway) research process might still be an important part and the network's self-evaluation might not be as good as one would need it to be at this point in time. I'd guess it's about adding ones personal judgement early to not end up at the wrong spot after a long processing time. Then again you are using a certain selection of simple and more complex models, so I can't say if there might be a way to have that kind of harsher judgement that one would apply oneself emulated in the proces and what side effects might come form that choice of models you made.
    In the end I was just surprised by the number of picky replies your post got, so I just thought that you discussing the perfect description with the LLM might make be a fun solution. Personally I am a big fan of interactively talking to LLMs at the moment though, so I might be the wrong guy to know your intended audience and use case enough. Just couldn't see how using the term "research" would be the problem.
    
    nickwatson 5 months ago
    
    Thanks for the feedback, hoc. I am already re-coding the whole thing to use more of an Agents/crew style approach where certain roles can be assigned to critique the research, fact-check, etc.
    I re-built with CrewAI but was limited so now doing the same again but using AutoGen.
    The criticism is important to this, as is any research project. That's why good research works, if it is challenged. I'm implementing that "challenging" directly into the LLM as separate agents.
    Balancing token efficiency with regular check-ins and critique is important, and something I want to ensure the user has control over.
    Nearly to version 0.1.3, and much to learn still.
    I loved that use case! I will keep testing, refining and hoping others will do until we get there. Persistence is what matters, to try our best and strive towards collaboration, openness and accuracy.
    Obscurity is dispelled by augmenting the light of discernment, not by attacking the darkness.” - Quote by Socrates.
    
    nickwatson 6 months ago
    
    I updated the Readme now, to describe as "CleverBee: AI-Powered Online Data Information Synthesis Assistant" and put emphasis on the synthesis.
    Also, put a new section in place:
    What cleverb.ee is not Cleverb.ee is not a replacement for deep domain expertise. Despite explicit instructions and low temperature settings, AI can still hallucinate. Always check the sources.
    
    semireg 6 months ago
    
    Did hoc intentionally pun by writing that this meta analysis is getting “lifted”?
    Reference: https://legacy.reactjs.org/docs/higher-order-components.html
    
    hoc 6 months ago
    
    I'd rather identify with the "These docs are old and won’t be updated." part of your linked page.
    
    latexr 6 months ago
    
    > I welcome your feedback, and any code amendments you propose to improve the tool. You clearly understand what makes good research and your contributions will be highly valued by all of us.
    This bit is worded in a way that feels manipulative. Perhaps it’s why your comment is being downvoted. Regardless, I’ll give you the befit of the doubt and believe you’re being honest and replying in good faith; my genuine intentions have been misinterpreted in the past too, and I don’t wish to do it to another.
    I won’t propose any code improvements, because I don’t believe projects like yours are positive to the world. On the contrary, this over-reliance on LLMs and taking their output as gospel will leave us all worse off. What we need is the exact opposite, for people to be actively aware of the inherent flaws in the system and internalise the absolute need to verify.
    
    nickwatson 6 months ago
    
    I'd like to humor you a bit on what you say (going off on a little tangent here).
    - Were all the existing sources (e.g. news, podcasts, etc) ever reliable? - Do people lobby for certain outcomes on some research/articles?
    And finally...
    - Now we know LLMs hallucinate, and news can easily be faked, are people finally starting to question everything, including what they were told before?
    Of course, mostly rhetorical but I think about this a lot - if it is a good or bad thing. Now we know we are surrounded by fakeness that can be generated in seconds, maybe people will finally gain critical thinking skills, and the ability to discern truth from falseness better. Time will tell!
    For now the way I see it is people are becoming reliant on these tools, and only will a community of people collaborating to better the outcomes can ensure alternate agendas do not lead the results.
    
    latexr 6 months ago
    
    > Were all the existing sources (e.g. news, podcasts, etc) ever reliable?
    No, of course not. And I would deeply appreciate if you stopped arguing with straw men. When you do it repeatedly you are either arguing in bad faith or unable to realise you’re doing so, neither of which is positive. Please engage with the given argument, not a weaker version designed to be attacked.
    But I’ll give you the benefit of the doubt once more.
    Provenance matters. I don’t trust everyone I know to be an expert on every subject, but I know who I can trust for what. I know exactly who I can ask biology, medical, or music questions. I know those people will give me right answers and an accurate evaluation of their confidence, or tell me truthfully when they don’t know. I know they can research and identify which sources are trustworthy. I also know they will get back to me and correct any error they may have made in the past. I can also determine who I can dismiss.
    The same is true for searching the web. You don’t trust every website or author, you learn which are trustworthy.
    You don’t have that with LLMs. They are a single source that you cannot trust for anything. They can give you different opposite answers to the same query, all with the same degree of confidence. And no, the added sources aren’t enough because not only are those often wrongly summarised, even stating the opposite, most people won’t ever verify them. Not to mention they can be made to support whatever point the creators want.
    > Now we know LLMs hallucinate, and news can easily be faked, are people finally starting to question everything, including what they were told before?
    No, they are not. That is idyllic and naive and betrays a lack of attention to the status quo. People are tricked and scammed every day by obviously AI pictures and texts, and double down on their wrong beliefs even when something is proven to have been a lie.
- vidarh 6 months ago
  
  A more precise term for what it is doing would be a "literature review".
  But I think you're right to describe it as research in the headline, because a lot of people will relate more to that term. But perhaps describe it as conducting a literature review further down.
  - nickwatson 6 months ago
    
    I agree. In all honesty I was just following on the trend that has been popularized by OpenAI/Google so it is more relatable but will mention the "literature review" as you suggest, it's a good idea.
    I didn't give the wording too much thought in all honesty - was just excited to share.
    Where would you suggest to put the literature review text? Readme.md?
    What about something like "synthesized findings from sources across the internet" or something like that.
    When I see the word literature, I immediately think of books.
- sReinwald 6 months ago
  I really have to challenge the notion of AI "distilling information without cognitive bias.
  First, AI systems absolutely embody cognitive biases - they're just different from human ones. These systems inherit biases from:
  - Their training data (which reflects human biases and knowledge cutoffs) - Architectural decisions made by engineers - Optimization criteria and reinforcement learning objectives - The specific prompting and context provided by users
  An AI doesn't independently evaluate source credibility or apply domain expertise - it synthesizes patterns from its training data according to its programming.
  Second: You frame AI as a "power suit" for distilling information faster. While speed has its place, a core value of doing research isn't just arriving at a final summary. It's the process of engaging with a vast, often messy, diversity of information, facts, opinions, and even flawed arguments. Grappling with that breadth, identifying conflicting viewpoints, and synthesizing them _yourself_ is where deep understanding and critical thinking are truly built.
  Skipping straight to the "distilled information," as useful as it might be for some tasks, feels like reading an incredibly abridged version of Lord of the Rings: A small man finds a powerful ring once owned by an evil God, makes some friends and ends up destroying the ring in a volcano. The end. You miss all the nuance, context, and struggle that creates real meaning and understanding.
  Following on from that, you suggest that this AI-driven distillation then "allows for another level, experiments, surveys, etc to uncover things even further." I'd argue the opposite is more likely. These tools are bypassing the very cognitive effort that develops critical thinking in the first place. The essential practice for building those skills involves precisely the tasks these tools aim to automate: navigating contradictory information, assessing source reliability, weighing arguments, and constructing a reasoned conclusion yourself. By offloading this fundamental intellectual work, we remove the necessary exercise. We're unfortunately already seeing glimpses of this, with people resorting to shortcuts like asking "@Grok is this true???" on Twitter instead of engaging critically with the information presented to them.
  Tools like this might offer helpful starting points or quick summaries, but they can't replicate the cognitive and critical thinking benefits of the research journey itself. They aren't a substitute for the human mind actively wrestling with information to achieve genuine understanding, which is the foundation required before one can effectively design meaningful experiments or surveys.
  - nickwatson 6 months ago
    
    Very true, and it got me thinking a lot.
    As humans, we align to our experiences and values, all of which are very diverse and nuanced. Reminds me of a friend who loves any medical conspiracy theory, whose dad was a bit of an ass to him, and of course, a scientist!
    Without our cognitive biases, are we truly human? Our values; our desired outcomes inherently are part of what shapes us. and of course the sources we choose to trust reinforce this.
    It's this that makes me think AGI can never be achieved, or human-like ability for AI to think, because we are all biased, like it or not. Collectively and through challenging each other, this is what makes society thrive.
    I feel there is no true path towards a single source of truth, but collaboratively we can at least work towards getting there as closely as possible
gnuly 6 months ago

this case should simply be called search, no?
to me research takes a long time, and not just an hour or so.
- pcthrowaway 6 months ago
  
  Well the models often do an initial search and then a follow-up search. So it's a re-search
  - nickwatson 6 months ago
    
    It does it several times, so maybe re-re-re-search works?
- kleiba 6 months ago
  
  You overlook the fact that the system integrates various sources into a coherent report. This in my opinion makes it more than just mere search.
yard2010 6 months ago

New is a matter of perspective.