refactor(query): extract SOCIAL_NOISE and VIRAL_NOISE shared sets#437
refactor(query): extract SOCIAL_NOISE and VIRAL_NOISE shared sets#437iliaal wants to merge 1 commit into
Conversation
Greptile SummaryThis PR extracts six near-identical inline noise frozensets from adapter
Confidence Score: 5/5Pure mechanical refactoring — every adapter's effective noise set is byte-for-byte identical before and after, verified by tests and manual count. Safe to merge. All seven adapters produce frozensets identical in content to their pre-refactor inline definitions; the new shared constants are pinned by membership-equality tests; no logic, API surface, or call sites were modified. No files require special attention; the only open nit is the inline frozenset in TestCustomNoise.test_custom_noise_keeps_tips that duplicates VIRAL_NOISE. Important Files Changed
|
|
@iliaal fix conflicts please |
Six adapters defined near-identical noise frozensets inline inside their _extract_core_subject wrapper. Move the shared sets to lib/query.py as SOCIAL_NOISE (18 words, used by Bluesky/Threads/Truth Social) and VIRAL_NOISE (25 words = SOCIAL_NOISE + 7 extras, used by TikTok/Instagram/Pinterest); have the adapters reference them. YouTube extends VIRAL_NOISE with temporal/meta tokens (months, recent year strings, etc.) that the planner emits but YouTube titles don't carry. Now composed as VIRAL_NOISE | _YT_EXTRA. Wrappers stay; they document each adapter's noise choice and avoid forcing callsites to know the right set. Polymarket's prefix-stripping _extract_core_subject and reddit's NOISE_WORDS default are out of scope. Set arithmetic verified: old _YT_NOISE (52 items) = new VIRAL_NOISE | _YT_EXTRA (25 + 27 = 52). Zero behavior change.
6332b6a to
08f16b3
Compare
|
@tmchow rebased on top of main. Conflicts (all in the test files touched by the conftest.py centralization that merged earlier today) are resolved. |
Summary
Six adapters defined near-identical noise frozensets inline inside their
_extract_core_subjectwrapper. The architecture review surfaced this in the_extract_core_subjectconsolidation finding (Architecture F5). Move the shared sets tolib/query.pyasSOCIAL_NOISEandVIRAL_NOISE; have the adapters reference them.Mapping
SOCIAL_NOISE(18 words): Bluesky, Threads, Truth Social. Short-form micro-social platforms where research/meta words rarely appear in post bodies.VIRAL_NOISE(25 words = SOCIAL_NOISE + 7 extras): TikTok, Instagram, Pinterest. Addskiller, the prompt-meta cluster, and the methodology cluster.VIRAL_NOISEwith temporal/meta tokens (months, recent year strings, etc.) that the planner emits but YouTube titles don't carry. Now composed asVIRAL_NOISE | _YT_EXTRA.What's NOT in scope
_extract_core_subjectis a custom prefix-stripper that doesn't go throughquery.extract_core_subjectat all. Out of scope.NOISE_WORDS. Out of scope.Test plan
TestSharedAdapterNoiseSetspins SOCIAL_NOISE membership (18 words), VIRAL_NOISE = SOCIAL_NOISE + 7 specific extras, and verifies extract_core_subject behavior with both sets._YT_NOISE(52 items) = newVIRAL_NOISE | _YT_EXTRA(25 + 27 = 52). Zero behavior change for YouTube.pytest tests/ --ignore=tests/test_exa_search.py: 1556 passed, 4 skipped.