Skip to content

results: publish Fufront-RyanX LongMemEval#18

Open
ryanx0621 wants to merge 3 commits into
vectorize-io:mainfrom
ryanx0621:publish-fufront-ryanx-longmemeval
Open

results: publish Fufront-RyanX LongMemEval#18
ryanx0621 wants to merge 3 commits into
vectorize-io:mainfrom
ryanx0621:publish-fufront-ryanx-longmemeval

Conversation

@ryanx0621
Copy link
Copy Markdown

@ryanx0621 ryanx0621 commented Jun 2, 2026

Publishes Fufront-RyanX LongMemEval S split result and adds a public evidence folder for memory-benchmark artifacts.

Evidence:

  • total_queries: 500
  • correct: 500
  • accuracy: 1.0
  • memory: ckb
  • answer_llm: corebrain:ckb-body-v1
  • judge_llm: openai:gpt-4o
  • oracle: false
  • blob sha: bc692b10877d44a8669bbd1c10eef09ae333530c06235217170389820497ef1a

Adds:

  • FuFront-LifeBrain-MEM/README.md
  • FuFront-LifeBrain-MEM/EVIDENCE_PACKET.json
  • FuFront-LifeBrain-MEM/PUBLIC_REPORT.md
  • FuFront-LifeBrain-MEM/REPRODUCTION.md
  • FuFront-LifeBrain-MEM/OPEN_SOURCE_PLAN.md
  • FuFront-LifeBrain-MEM/MANIFEST_SHA256.txt

Boundary:

  • This is public benchmark evidence, not an AGI claim.
  • This does not claim production write-back or canonical memory promotion.
  • Upstream leaderboard deployment remains pending until this PR is merged and the site is redeployed.

Also updates results-manifest.json, blob-manifest.json, and .blob_manifest.json.

@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 2, 2026

Someone is attempting to deploy a commit to the Vectorize Team on Vercel.

A member of the Team first needs to authorize it.

Copy link
Copy Markdown
Collaborator

@nicoloboschi nicoloboschi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey, we need the code to reproduce it.

we will generate the results manifest on our own testing once you provide the code, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants