The Scoring Algorithm

We score 700+ articles per batch without a single API call. Here’s the exact system, from keyword matching to semantic embeddings to freshness decay.

How It Works

Every article gets a 0-10 score based on six signals: keyword tiers, source multipliers, trending boosts, category tagging, freshness decay, and (optionally) semantic embedding similarity. The keyword-only pipeline runs in under 1 second on a laptop. Adding embeddings bumps that to about 4 seconds on a MacBook M-series with no GPU.

Tier 1: Paradigm Shift Keywords (+2.0 each, max 8 points)

These signal fundamental changes in how AI work gets done:

PARADIGM_SHIFT_SIGNAL = [
    "ai first", "ai-first", "before human", "replaces", "replacing",
    "fundamentally", "paradigm", "game changer", "disrupt",
    "transform", "revolution", "reimagin", "rethink", "new era",
    "without devops", "without engineers", "no-code", "zero-code",
    "autonomous", "self-driving", "self-healing", "self-improving",
    "solo founder", "one-person", "small team", "lean team",
    "without hiring", "replacing team", "ai employee",
    "6 engineers", "10 engineers", "no devops", "no ops",
    "competitor", "competes with", "alternative to", "vs ",
    "openclaw", "defenseclaw", "new player", "just launched",
    "stealth", "backed by", "raised", "funding round",
]

Tier 2: Actionable Keywords (+1.0 each, max 6 points)

Things you can use or implement today:

ACTIONABLE_SIGNAL = [
    "claude", "computer use", "claude code", "anthropic",
    "openai", "gpt-4", "gpt-5", "codex", "gemini",
    "openclaw", "mcp server", "model context protocol",
    "just released", "now available", "launching", "announces",
    "use case", "real-world", "in production", "case study",
    "we built", "how we", "how i", "our experience",
    "framework", "methodology", "playbook",
    "vibe coding", "coding agent", "pr review",
    "scale", "scaling", "growth", "revenue", "customers",
    "startup", "founder", "entrepreneur",
    "pricing", "cost", "cheaper", "faster", "efficient",
]

Tier 3: Infrastructure Keywords (+0.3 each, max 2 points)

Important context but less immediately actionable:

INFRA_SIGNAL = [
    "agent", "agentic", "langchain", "llamaindex", "crewai",
    "llm", "prompt engineer", "rag", "api", "sdk",
    "open source", "hugging face", "mistral", "llama",
    "deploy", "ship", "launch", "release", "production",
    "template", "tutorial", "guide", "how to",
]

Penalties

Research Paper Penalty

RESEARCH_INDICATORS = [
    "arxiv", "we propose", "we present", "ablation study",
    "empirical", "theoretical", "in this paper",
    "markov", "bayesian", "stochastic",
]
# 3+ matches = 0.2x multiplier (80% reduction)
# 2 matches = 0.4x
# 1 match = 0.7x

Skip Patterns (user-trained)

SKIP_PATTERNS = [
    r"\bvllm\b", r"\bconcurrent request",
    r"\bbeginner\b", r"\bintroduction to\b",
    r"\bspeed up.*python\b",
]
# Matched articles score 0.5/10

Source Multipliers

Boosted Sources (user’s preferred)

SOURCE_BOOST = {
    "Vercel": 1.6, "SaaStr": 1.5, "Product Hunt": 1.5,
    "Anthropic": 1.5, "OpenAI": 1.5, "Claude": 1.5,
    "LangChain": 1.4, "Cursor": 1.5, "Latent Space": 1.4,
}

Penalized Sources

SOURCE_PENALTY = {
    "ArXiv": 0.4, "IEEE": 0.5, "ACM": 0.5,
    "KDnuggets": 0.7,
}

# HN engagement
if points >= 500: score *= 1.8
elif points >= 200: score *= 1.5
elif points >= 100: score *= 1.3

# Tier bonus
tier_bonus = {0: 3.0, 1: 2.0, 2: 1.0, 3: 0.0, 4: 0.0}

Tier 0 (curated newsletters) get +3.0 base bonus because they’re already filtered by smart humans.

Category-Based Scoring

Not all articles compete in the same pool. We tag each article into a category and apply per-category score adjustments. This prevents “startup funding” articles from always beating “open source tool release” articles just because funding stories have splashier keywords.

CATEGORY_RULES = {
    "product_launch": {
        "signals": ["just launched", "now available", "announcing", "introduces"],
        "boost": 1.3,
        "min_per_digest": 2,   # always include at least 2
        "max_per_digest": 5,
    },
    "founder_story": {
        "signals": ["we built", "how we", "our experience", "solo founder", "bootstrapped"],
        "boost": 1.2,
        "min_per_digest": 1,
        "max_per_digest": 3,
    },
    "funding_news": {
        "signals": ["raised", "series a", "series b", "funding round", "backed by"],
        "boost": 1.0,          # no boost, just capped
        "min_per_digest": 0,
        "max_per_digest": 2,   # don't let funding stories dominate
    },
    "tutorial": {
        "signals": ["how to", "tutorial", "step by step", "guide", "walkthrough"],
        "boost": 1.1,
        "min_per_digest": 2,
        "max_per_digest": 4,
    },
    "opinion": {
        "signals": ["i think", "unpopular opinion", "hot take", "controversial"],
        "boost": 1.15,
        "min_per_digest": 1,
        "max_per_digest": 3,
    },
}

def categorize_article(title: str, summary: str) -> str:
    text = f"{title} {summary}".lower()
    best_cat, best_count = "general", 0
    for cat, rules in CATEGORY_RULES.items():
        hits = sum(1 for s in rules["signals"] if s in text)
        if hits > best_count:
            best_cat, best_count = cat, hits
    return best_cat

After scoring, we sort within categories and enforce the min/max constraints. This keeps your digest balanced. Nobody wants 15 funding announcements and zero tutorials.

Freshness Decay

Older articles get scored down. Simple exponential decay:

import math
from datetime import datetime, timezone

def freshness_multiplier(published_at: datetime, half_life_hours: float = 36.0) -> float:
    """
    Returns a multiplier between 0.1 and 1.0.
    At half_life_hours, the multiplier is 0.5.
    Articles older than 7 days floor at 0.1.
    """
    now = datetime.now(timezone.utc)
    age_hours = (now - published_at).total_seconds() / 3600
    if age_hours < 0:
        return 1.0  # future-dated, treat as fresh
    decay = math.exp(-0.693 * age_hours / half_life_hours)  # ln(2) ≈ 0.693
    return max(0.1, decay)

Why 36 hours? We tested a few values. At 24 hours, good articles from yesterday afternoon got buried. At 48 hours, stale stuff hung around too long. 36 is the sweet spot for a daily digest.

The decay curve looks like this:

0 hours old: 1.0x (full score)
12 hours: 0.79x
24 hours: 0.63x
36 hours: 0.50x
48 hours: 0.40x
72 hours: 0.25x
168 hours (7 days): 0.1x (floor)

Semantic Scoring with Embeddings

Keywords work. But they miss stuff. An article titled “Why Cursor Won the IDE War” has zero keyword matches for “coding agent” or “vibe coding” but it’s clearly relevant. Semantic scoring catches these.

Setup (5 minutes)

pip install sentence-transformers

First run downloads the model (~80MB). After that, it loads from cache.

Pick Your Model

We tested three options. Here’s what we found:

Model	Size	Speed (700 articles)	Quality
`all-MiniLM-L6-v2`	22M params, 384 dims	1.8s	Good enough
`all-mpnet-base-v2`	109M params, 768 dims	3.4s	Noticeably better
`BAAI/bge-small-en-v1.5`	33M params, 384 dims	2.1s	Best bang for buck

We use all-MiniLM-L6-v2 in production. It’s tiny, fast, and 90% as good as the bigger models for our use case. If you’re scoring fewer than 200 articles per batch, all-mpnet-base-v2 is worth the extra 1.6 seconds.

How It Works

You define a set of “ideal article” descriptions. The system encodes those into vectors once. Then for each incoming article, it encodes the title + summary and computes cosine similarity against your ideal set. High similarity = high relevance.

from sentence_transformers import SentenceTransformer
import numpy as np

class SemanticScorer:
    def __init__(self, model_name: str = "all-MiniLM-L6-v2"):
        self.model = SentenceTransformer(model_name)

        # Define what a "perfect article" looks like for YOUR newsletter.
        # Be specific. These aren't keywords, they're descriptions.
        self.ideal_articles = [
            "A startup built an AI agent that replaces a specific job function and shows real revenue numbers",
            "New developer tool for AI-assisted coding ships today with benchmarks against competitors",
            "Founder shares exact growth playbook with specific numbers and what actually worked",
            "Company migrates from traditional architecture to AI-first and shares the results",
            "Open source project launches that lets you run AI models locally without cloud dependency",
            "Practical guide showing how to build something with Claude or GPT with working code",
            "Analysis of why a specific AI company is winning or losing with concrete evidence",
        ]

        # Encode once, reuse forever
        self.ideal_embeddings = self.model.encode(self.ideal_articles, normalize_embeddings=True)

    def score(self, title: str, summary: str) -> float:
        """
        Returns 0.0-1.0 semantic similarity to ideal articles.
        """
        text = f"{title}. {summary}"
        embedding = self.model.encode([text], normalize_embeddings=True)

        # Cosine similarity against each ideal article
        similarities = np.dot(embedding, self.ideal_embeddings.T)[0]

        # Take the max similarity (best match to any ideal)
        best_match = float(np.max(similarities))

        # Also factor in average of top 3 (rewards broad relevance)
        top_3_avg = float(np.mean(np.sort(similarities)[-3:]))

        # Weighted: 70% best match, 30% top-3 average
        return 0.7 * best_match + 0.3 * top_3_avg

Converting Semantic Score to Points

Raw cosine similarity ranges from about 0.15 (totally irrelevant) to 0.75 (perfect match). We map this to 0-3 bonus points:

def semantic_bonus(similarity: float) -> float:
    """Map cosine similarity to 0-3 point bonus."""
    if similarity < 0.30:
        return 0.0       # not relevant
    elif similarity < 0.40:
        return 0.5       # slightly relevant
    elif similarity < 0.50:
        return 1.0       # relevant
    elif similarity < 0.60:
        return 2.0       # very relevant
    else:
        return 3.0       # perfect match

The Full Scoring Pipeline

Here’s the complete implementation that ties everything together:

import math
import re
from datetime import datetime, timezone
from dataclasses import dataclass
from typing import Optional

@dataclass
class Article:
    title: str
    summary: str
    source: str
    url: str
    published_at: datetime
    hn_points: int = 0
    tier: int = 3  # 0=curated newsletter, 1-4=other sources

@dataclass
class ScoredArticle:
    article: Article
    raw_score: float
    keyword_score: float
    semantic_score: float
    freshness: float
    source_mult: float
    trending_mult: float
    category: str
    final_score: float

def score_article(
    article: Article,
    semantic_scorer: Optional['SemanticScorer'] = None,
) -> ScoredArticle:
    text = f"{article.title} {article.summary}".lower()

    # 1. Keyword scoring
    kw_score = 0.0

    # Tier 1: paradigm shift
    t1_hits = sum(1 for kw in PARADIGM_SHIFT_SIGNAL if kw in text)
    kw_score += min(t1_hits * 2.0, 8.0)

    # Tier 2: actionable
    t2_hits = sum(1 for kw in ACTIONABLE_SIGNAL if kw in text)
    kw_score += min(t2_hits * 1.0, 6.0)

    # Tier 3: infrastructure
    t3_hits = sum(1 for kw in INFRA_SIGNAL if kw in text)
    kw_score += min(t3_hits * 0.3, 2.0)

    # Tier bonus (curated sources)
    kw_score += tier_bonus.get(article.tier, 0.0)

    # 2. Penalties
    # Research paper penalty
    research_hits = sum(1 for kw in RESEARCH_INDICATORS if kw in text)
    if research_hits >= 3:
        kw_score *= 0.2
    elif research_hits == 2:
        kw_score *= 0.4
    elif research_hits == 1:
        kw_score *= 0.7

    # Skip patterns (instant kill)
    for pattern in SKIP_PATTERNS:
        if re.search(pattern, text):
            return ScoredArticle(
                article=article, raw_score=0.5, keyword_score=0.5,
                semantic_score=0.0, freshness=1.0, source_mult=1.0,
                trending_mult=1.0, category="skipped", final_score=0.5,
            )

    # 3. Source multiplier
    source_mult = SOURCE_BOOST.get(article.source, 1.0)
    source_mult = SOURCE_PENALTY.get(article.source, source_mult)

    # 4. Trending boost
    trending_mult = 1.0
    if article.hn_points >= 500:
        trending_mult = 1.8
    elif article.hn_points >= 200:
        trending_mult = 1.5
    elif article.hn_points >= 100:
        trending_mult = 1.3

    # 5. Freshness decay
    freshness = freshness_multiplier(article.published_at)

    # 6. Semantic scoring (optional)
    sem_score = 0.0
    sem_bonus = 0.0
    if semantic_scorer:
        sem_score = semantic_scorer.score(article.title, article.summary)
        sem_bonus = semantic_bonus(sem_score)

    # 7. Category
    category = categorize_article(article.title, article.summary)
    cat_boost = CATEGORY_RULES.get(category, {}).get("boost", 1.0)

    # 8. Combine everything
    raw = kw_score + sem_bonus
    final = raw * source_mult * trending_mult * freshness * cat_boost
    final = min(final, 10.0)  # cap at 10

    return ScoredArticle(
        article=article,
        raw_score=raw,
        keyword_score=kw_score,
        semantic_score=sem_score,
        freshness=freshness,
        source_mult=source_mult,
        trending_mult=trending_mult,
        category=category,
        final_score=round(final, 2),
    )


def score_batch(
    articles: list[Article],
    semantic_scorer: Optional['SemanticScorer'] = None,
    top_n: int = 20,
) -> list[ScoredArticle]:
    """Score all articles, enforce category constraints, return top N."""
    scored = [score_article(a, semantic_scorer) for a in articles]

    # Sort by final score
    scored.sort(key=lambda x: x.final_score, reverse=True)

    # Enforce category min/max constraints
    result = []
    cat_counts = {}

    # First pass: guarantee minimums
    for cat, rules in CATEGORY_RULES.items():
        min_count = rules.get("min_per_digest", 0)
        cat_articles = [s for s in scored if s.category == cat]
        for a in cat_articles[:min_count]:
            if a not in result:
                result.append(a)
                cat_counts[cat] = cat_counts.get(cat, 0) + 1

    # Second pass: fill remaining slots by score, respecting max
    for s in scored:
        if len(result) >= top_n:
            break
        if s in result:
            continue
        cat = s.category
        max_count = CATEGORY_RULES.get(cat, {}).get("max_per_digest", top_n)
        if cat_counts.get(cat, 0) < max_count:
            result.append(s)
            cat_counts[cat] = cat_counts.get(cat, 0) + 1

    # Sort final result by score
    result.sort(key=lambda x: x.final_score, reverse=True)
    return result[:top_n]

Calibrating Thresholds

The numbers above (keyword weights, similarity cutoffs, decay half-life) aren’t magic. Here’s how to find yours.

Step 1: Build a Labeled Set

Take 100 articles you’ve already rated. You need at least 30 “good” (would include in newsletter) and 30 “bad” (would skip). Export them as JSON:

labeled = [
    {"title": "...", "summary": "...", "source": "...", "good": True},
    # ... 100 articles
]

Step 2: Score With Current Weights

Run your scoring pipeline on the labeled set. Then check precision and recall:

def evaluate(scored_articles, threshold=5.0):
    tp = sum(1 for a in scored_articles if a.final_score >= threshold and a.article.good)
    fp = sum(1 for a in scored_articles if a.final_score >= threshold and not a.article.good)
    fn = sum(1 for a in scored_articles if a.final_score < threshold and a.article.good)

    precision = tp / (tp + fp) if (tp + fp) > 0 else 0
    recall = tp / (tp + fn) if (tp + fn) > 0 else 0

    print(f"Threshold: {threshold}")
    print(f"Precision: {precision:.1%} (of articles above threshold, how many are good?)")
    print(f"Recall: {recall:.1%} (of good articles, how many scored above threshold?)")
    return precision, recall

Step 3: Find the Sweet Spot

Run evaluate at thresholds from 3.0 to 7.0 in 0.5 increments. You want:

Precision > 80%: you’re not drowning in noise
Recall > 70%: you’re not missing good stuff

If precision is low, raise keyword weights or add more penalty patterns. If recall is low, lower your threshold or add more signal keywords.

Step 4: Iterate Weekly

Every time you manually override the system (promote a low-scored article, skip a high-scored one), that’s a calibration signal. Log those overrides. After 2 weeks, re-run the evaluation. Your numbers will improve.

Keyword-Only vs Keyword+Embedding: Benchmarks

We ran both pipelines on 4 weeks of real data (2,800 articles, 80 manually curated digests). Here’s what we found:

Metric	Keywords Only	Keywords + Embeddings
Precision @ top 20	72%	84%
Recall @ top 20	68%	79%
“Surprise” catches (good articles with 0 keyword hits)	0	14 per week avg
Processing time (700 articles)	0.8s	4.2s
Setup complexity	Zero dependencies	pip install + 80MB model

The big win isn’t the precision bump. It’s the 14 articles per week that keywords completely miss. These tend to be the most interesting ones because they use unexpected language. An article titled “The $0 Stack That Prints Money” scores zero on keywords but the semantic scorer nails it because it’s about bootstrapped AI automation.

When to Skip Embeddings

You have fewer than 200 articles per batch (keywords are enough)
You can’t install Python dependencies (serverless with tight size limits)
You’re still in the first 2 weeks (get your keyword lists right first)

When Embeddings Are Worth It

Your newsletter covers a niche topic where vocabulary is unpredictable
You keep finding great articles that your keyword scorer missed
You have the 4 seconds to spare (you do)

Results

From 700+ articles per batch:

~200 filtered as noise (score 0)
~400 scored low (1-3)
~70 scored medium (4-6)
~30 scored high (7-10)
Top 20 shown to you

The whole thing costs $0 to run.

The Scoring Algorithm: How We Rank 700+ Articles Without AI

The Vault

The Scoring Algorithm: How We Rank 700+ Articles Without AI

The Scoring Algorithm

How It Works

Tier 1: Paradigm Shift Keywords (+2.0 each, max 8 points)

Tier 2: Actionable Keywords (+1.0 each, max 6 points)

Tier 3: Infrastructure Keywords (+0.3 each, max 2 points)

Penalties

Research Paper Penalty

Skip Patterns (user-trained)

Source Multipliers

Boosted Sources (user’s preferred)

Penalized Sources

Category-Based Scoring

Freshness Decay

Semantic Scoring with Embeddings

Setup (5 minutes)

Pick Your Model

How It Works

Converting Semantic Score to Points

The Full Scoring Pipeline

Calibrating Thresholds

Step 1: Build a Labeled Set

Step 2: Score With Current Weights

Step 3: Find the Sweet Spot

Step 4: Iterate Weekly

Keyword-Only vs Keyword+Embedding: Benchmarks

When to Skip Embeddings

When Embeddings Are Worth It

Results

The Vault

The Scoring Algorithm

How It Works

Tier 1: Paradigm Shift Keywords (+2.0 each, max 8 points)

Tier 2: Actionable Keywords (+1.0 each, max 6 points)

Tier 3: Infrastructure Keywords (+0.3 each, max 2 points)

Penalties

Research Paper Penalty

Skip Patterns (user-trained)

Source Multipliers

Boosted Sources (user’s preferred)

Penalized Sources

Trending Boost

Category-Based Scoring

Freshness Decay

Semantic Scoring with Embeddings

Setup (5 minutes)

Pick Your Model

How It Works

Converting Semantic Score to Points

The Full Scoring Pipeline

Calibrating Thresholds

Step 1: Build a Labeled Set

Step 2: Score With Current Weights

Step 3: Find the Sweet Spot

Step 4: Iterate Weekly

Keyword-Only vs Keyword+Embedding: Benchmarks

When to Skip Embeddings

When Embeddings Are Worth It

Results