Briefly: "No, you cannot use a star-tree index with upserts in Apache Pinot. This is a built-in restriction that is explicitly enforced in the system."
Why This Restriction Exists
When upserts are enabled for a table in Pinot, the system explicitly checks and prevents the use of star-tree indexes. This validation happens during table configuration validation.
The code in TableConfigUtils.java explicitly enforces this restriction:
[snip]
At cursory glance, I did not detect a hallucination. The answer was true (AFAIK), clear, and objective. I also see that you can peek into other resources to get some additional contextual information.
This seems like a really cool project and could be really useful in getting to know a new codebase. How do you ensure that the output in the wiki does not contain hallucinations?
When working with LLMs you can never exclude hallucinations entirely but we‘ve been carefully tuning the system over time and found it to be pretty high signal! The reason we display the code snippets is to make it easy to double check with the source
Can you "chunkify" output so that you can rate different elements independently? Like "This part of the answer is totally cool" and "Wait. This part right here includes a hallucination."
Then allow for feedback to be provided if an issue is spotted.
This might be a bit off-topic but is this custom-built documentation or did you use a template? I've been on the hunt for pretty documentation tools but have only come up with Mintlify.
Is it custom all the way down or based on something like starlight? I really like Mintlify but $200 per doc site hurts for internal tools.
I can see a lot of merit in custom building though especially with making it easier to dump it into `llms.txt` or exposing a search for AI. Hoping that's where DeepWiki is headed :)
I have tried this so far for the following repos of interest:
• Apache Pinot - https://deepwiki.com/apache/pinot
• Apache Pulsar - https://deepwiki.com/apache/pulsar
• Apache Superset - https://deepwiki.com/apache/superset
• Grafana - https://deepwiki.com/grafana/grafana
• K8sGPT - https://deepwiki.com/k8sgpt-ai/k8sgpt
Seems, at least at a cursory glance, to produce decent output.
I also asked Devin a question:
Can I use a star-tree index with upserts?
I was pleasantly surprised by the answer:
https://deepwiki.com/search/can-i-use-a-startree-index-wit_a...
Briefly: "No, you cannot use a star-tree index with upserts in Apache Pinot. This is a built-in restriction that is explicitly enforced in the system."
Why This Restriction Exists
When upserts are enabled for a table in Pinot, the system explicitly checks and prevents the use of star-tree indexes. This validation happens during table configuration validation.
The code in TableConfigUtils.java explicitly enforces this restriction:
[snip]
At cursory glance, I did not detect a hallucination. The answer was true (AFAIK), clear, and objective. I also see that you can peek into other resources to get some additional contextual information.
Very impressive work.
Hey I'm Silas, I worked on this. Lmk if you have any questions!
This seems like a really cool project and could be really useful in getting to know a new codebase. How do you ensure that the output in the wiki does not contain hallucinations?
When working with LLMs you can never exclude hallucinations entirely but we‘ve been carefully tuning the system over time and found it to be pretty high signal! The reason we display the code snippets is to make it easy to double check with the source
Can you "chunkify" output so that you can rate different elements independently? Like "This part of the answer is totally cool" and "Wait. This part right here includes a hallucination."
Then allow for feedback to be provided if an issue is spotted.
Hi Silas!
Are there any plans to allow running DeepWiki against private non-GitHub repos (e.g. GitLab, Azure DevOps, etc) and keep the output private?
This might be a bit off-topic but is this custom-built documentation or did you use a template? I've been on the hunt for pretty documentation tools but have only come up with Mintlify.
This is entirely custom built but we‘re also fans of Mintlify. We use them for docs.devin.ai
Is it custom all the way down or based on something like starlight? I really like Mintlify but $200 per doc site hurts for internal tools.
I can see a lot of merit in custom building though especially with making it easier to dump it into `llms.txt` or exposing a search for AI. Hoping that's where DeepWiki is headed :)
Great work Silas! Can this also be trained on externalized sites, such as human-written docs, our web site, public-facing Google docs, etc?
Currently it uses only the codebase itself. However we‘ve been thinking about adding other source like docs. Can’t promise any timelines yet
Another idea: allow more "chunkified" approvals, and verbatim feedback:
"This part is correct, but this next paragraph? You're hallucinating / misinterpreting. What you really need to say is 'x'."
Also, I suggested that feedback also get scored in some way. For instance, what if someone is sending malicious feedback?
How do you discourage or de-weight someone downvoting good answers while upvoting wrong answers?