FABBI AIOS
Technical Intelligence Brief
LLM / Coding Agent / AI SDLC
2026-05-28 09:06 ICT
Status: PARTIAL / 171 candidates

1Executive Snapshot

171candidates scanned
32X/search fallback signals
20YouTube signals
64GitHub repo signals
0arXiv fresh; 429
  • Agent harness/runtime: 8/30 dev-web mẫu xoay quanh deterministic benchmark, portable harness, runtime OS → NEXA cần test harness nội bộ 2 tuần.
  • CLI/IDE coding agents: 64 GitHub repos, repo manaflow-ai/cmux đạt 20,014 stars / 1,501 forks / 2,215 issues → nhu cầu orchestration cao, rủi ro issue-load cao.
  • Sandbox/security: microsandbox 6,327 stars / 308 forks / 52 issues → enterprise adoption cần isolation trước khi auto-PR.
  • Spec-as-source: HN item “specs feel more like source code” 1 point nhưng đúng hướng: coding-agent productivity phụ thuộc context/spec formalization → FARE ưu tiên repo+spec graph.
  • Social completeness: 3/4 nhóm xã hội có tín hiệu (X/YT/Reddit), Facebook public 0 usable → confidence tổng 72%, không block publish.

2KOL/OG Feed Watch

PlatformAuthor/KênhMetricURLÝ nghĩa CTO
HNGodelNumbering · 2026-04-27393 pts / 148 commentsdiracOSS agent top TerminalBench claim → cần benchmark lại trên repo Fabbi.
HNsjhalani7 · 2026-05-276 pts / 3 commentsVAENPortable AI coding-agent harness → packaging pattern cho NEXA eval.
HNe2e4 · 2026-05-272 pts / 1 commentDeepSWEFrontier coding-agent measurement → bổ sung benchmark ngoài SWE-bench.
GitHubmanaflow-ai · 2026-05-2820,014 stars / 1,501 forkscmuxMulti-agent/workflow orchestration đang kéo adoption.
GitHubgptme · 2026-05-284,311 stars / 390 forksgptmeCLI agent OSS baseline cho comparison.
FacebookN/A0 usableN/A — public search fallback không có link dùng đượcGiảm confidence social-market 8 điểm.

3Trend Radar

  • HOT Harness/eval runtime: 5 HN/GitHub signals liên quan TerminalBench/SWE-like.
  • HOT Sandbox cho agent: microsandbox 6,327 stars.
  • EMERGING Spec-as-source/context graph: 2 dev-web signals, engagement thấp nhưng strategic fit cao.
  • WATCH Parallel Claude/coding workers: Claudeverse 1 HN point, early.
  • NOISE Vibe-code app launches: ≥3 HN items, low reusable enterprise value.

4CTO Evaluation Matrix

SignalThesisEvidenceCounter-signalFabbi implicationConf.DecisionNext validation
Harness-first agentAgent value chuyển từ model prompt sang harness đo được.VAEN, DeepSWE, Tracecore, TerminalBench mentions: 5+ signals.Engagement HN thấp 1-6 pts trừ Dirac 393 pts.NEXA/SYNCA cần eval pack nội bộ.78%trial20 tasks từ 2 repo khách hàng, pass@1/cost/time.
Sandbox runtimeEnterprise chỉ scale khi agent execution bị giới hạn quyền.microsandbox 6,327 stars; Amber capability runtime HN.Security maturity chưa đủ audit.AIOS governance + SYNCA risk gate.74%trialPoC locked filesystem/network 5 workflows.
Context/spec graphSpec/codebase memory là bottleneck chính.Repowise, spec-as-source, implicit knowledge threads: 3 signals.Số liệu adoption thiếu.FARE nên ưu tiên codebase intelligence.70%adoptMeasure retrieval precision@10 trên 3 repos.
OSS CLI baselineOSS agents đủ tốt để làm control group.gptme 4,311 stars; cmux 20,014 stars; orca 3,551 stars.Issue-load cmux 2,215 cao.Giảm lock-in Claude/Codex khi benchmark.76%trialCompare Claude Code/Codex/gptme trên 30 tasks.

5Repo Watch

RepoMomentum metricRiskMove
cmux20,014 stars / 1,501 forks / 2,215 issuesIssue pressure 2,215Watch architecture, not adopt raw.
microsandbox6,327 stars / 308 forks / 52 issuesSecurity audit N/APoC sandbox layer.
gptme4,311 stars / 390 forks / 11 issuesEnterprise controls N/ABaseline CLI agent.
orca3,551 stars / 238 forks / 229 issues229 issuesWatch.

6Paper / Benchmark / Product Watch

  • arXiv fresh: 0 collected; reason HTTP 429 x5 → confidence benchmark-paper reduced.
  • TerminalBench/DeepSWE: 4+ web signals; use as external eval inspiration, not direct KPI until reproduced.
  • Claude Code/Codex/Cursor/Devin/OpenCode: product-specific fresh official changelog not captured in this run; track next 7d.
  • Product move: OSS runtime/orchestration repos provide faster test surface than vendor announcements today.

7Impact Coverage

DomainNow 0-2wNext 1-2mLater 3-6mMode
FARESpec/codebase graph MVP; 3 repos.Retrieval precision + agent context pack.Cross-project knowledge memory.adopt
NEXA20-task harness, 3 agents.Sandboxed execution + auto-PR.Customer-specific agent runtime.trial
SYNCARisk rubric: code diff, test, secret, permission.Human-in-loop gate.Governed AI SDLC platform.adopt
DOMUSMonitor low direct signal.Apply only to internal automation.Vertical agent workflows if ROI proven.monitor
Japan/VN/GlobalJapan: security-first; VN: productivity pilots; Global: OSS orchestration.Package case study with 15-25% cycle-time saving target.Managed AI SDLC offering.trial

8CTO Recommendations — exactly 4

1. Build Fabbi Agent Eval Pack v0
Why now: 171-signal run shows harness/runtime cluster strongest. ROI/time-saving: 15-25% dev cycle if pass@1 improves ≥10%. Risk: 2/5. Owner: Head of AI Eng. TTV: 10 working days. Validate: 20 tasks, 3 repos, pass@1/cost/time/security.
2. Ship sandboxed NEXA PoC
Why now: microsandbox 6,327 stars proves execution isolation demand. ROI/time-saving: 10-18% via safer auto-run tests. Risk: 3/5. Owner: Platform Lead. TTV: 2 weeks. Validate: 5 workflows with blocked network/secrets/fs writes.
3. FARE context/spec graph pilot
Why now: spec-as-source + repo-intel signals indicate context bottleneck. ROI/time-saving: 12-20% fewer clarification loops. Risk: 2/5. Owner: Solution Architect. TTV: 3 weeks. Validate: precision@10, answer groundedness, hallucination rate.
4. Vendor-neutral coding-agent benchmark
Why now: OSS/control group avoids Claude/Codex lock-in. ROI/time-saving: 8-15% procurement+tooling efficiency. Risk: 2/5. Owner: CTO Office. TTV: 1 week. Validate: Claude Code vs Codex vs gptme on same 30 tasks.

9Must-read Sources

S01 HNDirac OSS agent / TerminalBench claim
P0 — 393 pts / 148 comments. Read to design benchmark control.
P0
S02 GitHubmanaflow-ai/cmux
P0 — 20,014 stars / 1,501 forks. Orchestration signal.
P0
S03 GitHubmicrosandbox
P0 — sandbox primitive for agent execution.
P0
S04 HNDeepSWE
P1 — frontier coding-agent measurement.
P1
S05 HNVAEN portable harness
P1 — packaging harness concept.
P1
S06 BlogSpecs feel more like source code
P1 — context/spec operating model.
P1

10Data Quality / Scan Health Appendix

Plan executed: manifest → collectors → normalize → gates → score → Vietnamese synthesis → HTML → Cloudflare deploy. Counts: total 171; dev_web/HN 30; GitHub 64; Reddit 25; YouTube 20; X 32 via search fallback; papers/product 0 due arXiv HTTP 429; Facebook public 0 no usable links. Gates: source volume PASS, social completeness PASS 3/4, source links PASS, papers/product PARTIAL. Confidence: 72% overall; paper/product claims limited.