AIScope SG · Public-Interest Research

Methodology — government-grade transparency

This page documents how AIScope SG turns Singapore occupation statistics into an auditable AI exposure index, how the 2026 scoring model reasons about cognition vs. physical work, and how GraphRAG-style triples suggest transition paths.

Exposure calculation

Each occupation ships a structured payload (wages, employment, SSOC metadata, PWM/regulated flags, and a short task narrative). A calibrated reasoning model estimates how much of the documented task bundle is already automatable with frontier AI systems, producing a raw exposure score $Score_{raw}\in[0,10]$. The dashboard never treats that number as law: it is merged with deterministic policy gates in validate_data.py before publication.

A compact regulatory discount form (the deployed tree also applies PWM caps separately) is:

$$Score_{final} = \min\left(4.0,\; Score_{raw} \times W_{reg}\right)$$

Here $W_{reg}\in(0,1]$ encodes licensing moats for desk-heavy professions where MOH clinical governance, MAS financial-stability rules, or legal practice management materially slow automation, even when raw task text looks “GPT-friendly”. Values clamp so narrative cannot outrank code.

Mitigation factors (Singapore policy stack)

PWM (Progressive Wage Model): covered occupations receive a hard ceiling on the public score so wage-floor policy and AI substitution risk are not conflated in the headline index.

Regulated sectors: MOH (clinical and patient-safety workflows), MAS (financial market infrastructure and conduct tech), SAL/BCA-adjacent licensing (built-environment assurance) each imply additional human-in-the-loop obligations. Those levers enter as $W_{reg}$ and/or PWM caps rather than prose-only disclaimers.

1. Scoring algebra (PWM & regulation)

The public dashboard shows a calibrated score per occupation. The LLM proposes a raw automatibility score $S_{raw}\in[0,10]$. Policy levers (PWM wage floors, licensing) are enforced as hard caps in code — not as post-hoc narrative.

For exposition we write a compact multiplicative intuition (the deployed pipeline applies caps in validate_data.py and in the LLM system prompt):

$$S_{final} = \min\left(4.0,\; S_{raw} \cdot w_{reg} \cdot w_{pwm}\right)$$

where $w_{pwm}\rightarrow 0$ enforces the PWM ceiling at 4.0 for covered roles, and $w_{reg}<1$ discounts desk risk when statutory licensing is binding (SAL / MOH / MAS contexts). Non-PWM cognitive roles can approach the top of the scale under the 2026 reasoning-model calibration (long-horizon document chains).

Vulnerability index (0–1)

The concierge / graph bundle also surfaces a companion metric separate from the headline 0–10 AI score. It is computed deterministically in scripts/generate_graph.py as a near-term displacement pressure gauge:

$$\text{Vulnerability}=\mathrm{clamp}_{[0,1]}\left(\frac{S_{final}}{10}-0.2\cdot\mathbb{1}_{PWM}-0.15\cdot\mathbb{1}_{Reg}+0.08\cdot\mathbb{1}_{WFH}\right)$$

PWM and licensing flags lower the index because wage floors and statutory human-in-the-loop duties dampen immediate displacement risk even when cognitive automation is high. Remote-friendly roles receive a small upward nudge (coordination / oversight fragility). Use the 0–10 score for automation exposure; use vulnerability for prioritising transition support in Results-style triage.

2. Data pipeline (MOM → model → D3)

End-to-end delivery: open-data ingest, model scoring with audit artefacts, schema validation, graph generation, and the static GitHub Pages bundle.

flowchart TB
  API[data.gov.sg datastore_search] --> FETCH[run_pipeline.py --fetch]
  FETCH --> RAW[data/raw/wages_fetched.json]
  RAW --> EXP[step4_export → web/data/data.json]
  EXP --> LLM[Claude 3.5 Sonnet scoring + audit]
  LLM --> VAL[validate_data.py + audit_report]
  VAL --> GRAPH[generate_graph.py]
  GRAPH --> TRIP[triples.jsonl + kg_indices.jsonl]
  EXP --> UI[D3 treemap + drawer + i18n]
  TRIP --> UI
  VAL --> CI[GitHub Actions / gh-pages]

3. Scoring logic — physical vs cognitive

flowchart TD
  O[Occupation payload] --> Q{PWM-covered?}
  Q -->|Yes| C[PWM hard cap 4.0]
  Q -->|No| R{Regulated profession?}
  R -->|Yes| L[Apply licensing moat discount]
  R -->|No| K[Cognitive exposure pathway]
  K --> T[2026 CoT / cross-document risk uplift]
  L --> U[Merge + SkillsFuture transition text]
  T --> U
  C --> U[Validated JSON row]

4. GraphRAG retrieval for pivots

flowchart LR
  S[Skill tokens from risk_factor] --> G[NetworkX graph]
  G --> E1[REQUIRES_SKILL edges]
  G --> E2[TRANSFER_PATH high to low AI score]
  E2 --> T[triples.jsonl]
  T --> U[Frontend: dashed transfer arcs]

5. Worked example — “Accountant” class desk role

Interactive narrative (illustrative)

Consider a median-wage accounting / bookkeeping style occupation (SSOC clerical/professional mix in the dataset). The 2026 model elevates exposure because chain-of-thought systems already handle multi-voucher reconciliation, intercompany eliminations, and policy cross-checks across PDF + spreadsheet sources.

Inputs

Median gross wage & employment weights
PWM / regulated flags from occupation name heuristics
Singapore multilingual front-line factor (weaker for pure back-office)

Output shape

$S_{raw}$ high if tasks are structured cognition
If PWM: hard clamp $S_{final}\le 4$ regardless of narrative
reason must cite SkillsFuture transition path

The live treemap colour encodes $S_{final}$ after caps; the drawer shows source_meta.llm_model and wage year for reproducibility.

6. Data compliance & official provenance

API lineage

Where configured, wage and employment tabular extracts are pulled through the data.gov.sg production datastore_search API using the x-api-key header. If the key is absent or the service is temporarily unavailable, the repository pipeline falls back to a bundled reference extract under data/raw/wages_fallback.json so automated builds remain deterministic.

Licence

Open datasets on data.gov.sg are published under the Singapore Open Data Licence. AIScope SG redistributes only aggregated, transformed indicators in web/data/data.json and does not republish full proprietary MOM tables beyond what the licence permits for the API responses you retrieve.

Update cadence

Synced with MOM/SSOC-aligned databases via Production API whenever operators run python run_pipeline.py --fetch (or the equivalent GitHub Actions step). The dashboard meta.generated_at field records the last hierarchy export; snapshot archives under data/processed/snapshots/ support wage-drift checks in generate_insights.py.

Wage data: MOM Occupational Wages 2024 (official), with 2025 projections applied at +5.5% growth rate per MOM Labour Force Report 2025. Official 2025 occupation-level tables expected August 2026.

Employment headline on the dashboard

Per-occupation employment weights are proportionally scaled in pipeline/step4_export.py so meta.total_employment matches a configurable national anchor (AISCOPE_TARGET_TOTAL_EMPLOYMENT, default 3.72M). This preserves relative SSOC weights while aligning the public total with Singapore workforce scale, correcting synthetic expansions that would otherwise stack multi-year industry totals into occupation rows.

AIScope SG · Public-Interest Research

方法论 — 政府级透明说明

本页说明 AIScope SG 如何将新加坡职业统计转化为可审计的 AI 暴露指数、2026 评分模型如何区分认知型与体力型工作，以及 GraphRAG 风格三元组如何提示转型路径。

暴露度计算（Exposure）

每个职业载荷包含工资、就业、SSOC 元数据、PWM/监管标记及简短任务叙述。推理模型据此估计已文档化的任务束在前沿 AI 下可被自动化的比例，得到原始暴露分 $Score_{raw}\in[0,10]$。仪表板不会把该数值当作终裁：发布前在 validate_data.py 中与确定性政策门合并。

为突出监管折扣，给出与英文版一致的紧凑形式（线上另含 PWM 等硬上限）：

$$Score_{final} = \min\left(4.0,\; Score_{raw} \times W_{reg}\right)$$

其中 $W_{reg}\in(0,1]$ 吸收 MOH 临床治理、MAS 市场基础设施与行为监管、以及法律/建筑环境类执业约束对「案头可自动化」叙事的下修，即使任务文本看似可被大模型直接执行。

缓释因子（Mitigation）

渐进式薪资（PWM）：对适用职业在公开分数上设置硬上限，避免把最低工资政策与 AI 替代风险混在同一 headline。

行业监管：卫生部（MOH）、金管局（MAS）、律师会及 BCA 相关许可语境下的人机协同义务，通过 $W_{reg}$ 与/或 PWM 上限进入模型，而非仅靠免责声明段落。

1. 评分代数（PWM 与监管）

公开仪表板展示每个职业的校准分数。大模型给出原始可自动化分数 $S_{raw}\in[0,10]$。PWM 工资底线、执业牌照等政策工具在代码中作为硬上限执行，而非事后叙事补丁。

为便于理解，我们用紧凑的乘性直觉表述（线上流水线在 validate_data.py 与 LLM system prompt 中落实上限）：

$$S_{final} = \min\left(4.0,\; S_{raw} \cdot w_{reg} \cdot w_{pwm}\right)$$

其中 $w_{pwm}\rightarrow 0$ 对适用职业落实 PWM 上限 4.0；$w_{reg}<1$ 在法定执业约束显著时下调案头风险（如律师会、卫生部、金管局等语境）。非 PWM 的认知型岗位在 2026 推理模型标定下仍可达标尺上端（长链路文档任务）。

脆弱度指数（0–1）

礼宾卡片与图谱会展示一个与 headline 0–10 AI 分并列、但含义不同 的指标。它在 scripts/generate_graph.py 中按确定性规则计算，用于刻画近端可置换压力：

$$\text{Vulnerability}=\mathrm{clamp}_{[0,1]}\left(\frac{S_{final}}{10}-0.2\cdot\mathbb{1}_{PWM}-0.15\cdot\mathbb{1}_{Reg}+0.08\cdot\mathbb{1}_{WFH}\right)$$

PWM 与监管标记会拉低该指数（工资底线与法定人机协同义务缓冲短期替代冲击）；远程友好岗位有小幅上修（协同与监督摩擦）。叙事上：0–10 分描述自动化暴露；脆弱度更贴近“谁应优先获得转型与再培训资源”的排序参考。

2. 数据流水线（MOM → 模型 → D3）

从开放数据抓取、带审计的模型打分、模式校验、图谱生成到 GitHub Pages 静态交付的完整链路。

flowchart TB
  API[data.gov.sg datastore_search] --> FETCH[run_pipeline.py --fetch]
  FETCH --> RAW[data/raw/wages_fetched.json]
  RAW --> EXP[step4_export → web/data/data.json]
  EXP --> LLM[Claude 3.5 Sonnet 打分 + 审计]
  LLM --> VAL[validate_data.py + audit_report]
  VAL --> GRAPH[generate_graph.py]
  GRAPH --> TRIP[triples.jsonl + kg_indices.jsonl]
  EXP --> UI[D3 树图 + 抽屉 + i18n]
  TRIP --> UI
  VAL --> CI[GitHub Actions / gh-pages]

3. 评分逻辑 — 体力与认知

flowchart TD
  O[职业载荷] --> Q{适用 PWM?}
  Q -->|是| C[PWM 硬上限 4.0]
  Q -->|否| R{受监管专业?}
  R -->|是| L[执业护城河折扣]
  R -->|否| K[认知暴露路径]
  K --> T[2026 CoT / 跨文档风险上调]
  L --> U[合并 + SkillsFuture 转型叙述]
  T --> U
  C --> U[校验后的 JSON 行]

4. GraphRAG 检索与转型枢轴

flowchart LR
  S[risk_factor 技能词] --> G[NetworkX 图]
  G --> E1[REQUIRES_SKILL 边]
  G --> E2[TRANSFER_PATH 高→低 AI 分]
  E2 --> T[triples.jsonl]
  T --> U[前端：虚线转型弧]

5. 示例 — 会计 / 簿记类案头岗位

交互叙事（示意）

以数据集中会计 / 簿记类中等薪岗位为例（SSOC 文书/专业混合）。2026 模型认为思维链系统已能处理多凭证核对、公司间抵销与跨 PDF+试算表的策略核对，因此暴露度上调。

输入

月薪中位数与就业权重
由职业名称启发式推断的 PWM / 监管标记
新加坡多语前线因素（纯后勤岗影响较弱）

输出形态

结构化认知任务多则 $S_{raw}$ 偏高
若 PWM：无论叙事如何，$S_{final}\le 4$ 硬夹
reason 须引用 SkillsFuture 转型路径

树图颜色编码上限后的 $S_{final}$；抽屉展示 source_meta.llm_model 与工资年份以便复现。

6. 数据合规与官方来源

API 血缘

在已配置时，工资与就业表格经 data.gov.sg 生产环境 datastore_search API 拉取，并使用 x-api-key 头。若密钥缺失或服务暂不可用，流水线回退到 data/raw/wages_fallback.json 中的打包参考抽取，以保证自动化构建可重复。

许可

data.gov.sg 开放数据遵循 Singapore Open Data Licence。 AIScope SG 仅再分发 web/data/data.json 中的聚合与变换指标，不在许可范围外再发布完整专有 MOM 表。

更新节奏

当运维执行 python run_pipeline.py --fetch（或等效 GitHub Actions 步骤）时，经 Production API 与 MOM/SSOC 对齐库同步。仪表板 meta.generated_at 记录最近层级导出；data/processed/snapshots/ 下的快照支持 generate_insights.py 的工资漂移检查。

工资数据声明：以 MOM Occupational Wages 2024 官方值为基准，并按 MOM Labour Force Report 2025 使用 +5.5% 增长率投影至 2025。 2025 年职业级官方表预计于 2026 年 8 月发布。

仪表板就业总人数

pipeline/step4_export.py 将各职业就业权重按比例缩放，使 meta.total_employment 对齐可配置的国家锚定值（环境变量 AISCOPE_TARGET_TOTAL_EMPLOYMENT，默认 372 万）。这样公开 headline 与新加坡劳动力量级一致，同时保留各 SSOC 行的相对结构，避免把多年度行业时间序列误叠到职业行后造成的数量级膨胀。