Commit Graph

123 Commits

Author SHA1 Message Date
Tony
6251a2dc02 fix(core): parse date utils (#10789)
* fix(core): parse date utils

* fix: regex
2022-09-14 21:59:32 +08:00
DIYgod
9329a44c80 feat(core): support proxy config for pupperteer (#10714)
* feat: support proxy config for pupperteer

* test: add puppeteer proxy detection

* fix: package.json

* test: fix regex
2022-09-07 00:07:04 +08:00
Tony
9fa248c324 fix(route): nmpa feed items on 3rd party site (#10146)
* fix(route): nmpa feed items on 3rd party site

* fix: regex unescaped .
2022-07-06 00:46:05 +08:00
Atlas Quan
bd80574f76 feat(route): add slowmist news (#10044)
* feat(route): add slowmist news

* feat(utils): remove useless ul for code section

* docs: add slowmist news route docs

* docs(quick-start): new router under /lib/v2

* test(utils): add test fixArticleContent for new wechat-mp code

* feat(router): update radar source of slowmist
2022-07-04 23:44:58 +08:00
Tony
2f26479fc9 feat(route): sspai series update (#9901) 2022-06-05 23:59:10 +08:00
github-actions[bot]
5bc8be9ac5 style: auto format 2022-06-05 14:42:33 +00:00
Tony
450e522167 feat(utils): decode cf protected email string (#9900)
* feat(utils): decode cf protected email string

* Update cf-email.js
2022-06-05 22:40:57 +08:00
Tony
a5fd8de30e fix(docker): puppeteer stealth not working in docker (#9896) 2022-06-05 17:09:34 +08:00
MisLink
df6f4caf8b feat(route): add 北京大学国家发展研究院 - 观点 (#9804)
* feat(route): add 北京大学国家发展研究院 - 观点

Signed-off-by: MisLink <gjq.uoiai@outlook.com>

* Fix https issue

Signed-off-by: MisLink <gjq.uoiai@outlook.com>

* fetch full text from wechat-mp and pku news

Signed-off-by: MisLink <gjq.uoiai@outlook.com>

* Fix ci

Signed-off-by: MisLink <gjq.uoiai@outlook.com>

* refactor: sort new route
2022-05-27 19:12:04 +08:00
Tony
0f31bfa8b9 fix(utils): parse relative date with meridiem (#9775)
* fix(utils): parse relative date with meridiem

* fix: use regex
2022-05-17 20:21:47 +08:00
Rongrong
23fcb6bc5a feat(core/utils/request-wrapper): request logging (#9691)
Signed-off-by: Rongrong <i@rong.moe>
2022-05-04 12:36:48 +10:00
Tony
0b544e1395 feat(utils): puppeteer-extra-plugin-stealth (#9676) 2022-05-03 01:17:17 +08:00
Rongrong
7a6be9a229 feat(core): customizable Chromium executable path (#9670)
* feat(core): customizable Chromium executable path

also build Chromium-bundled Docker image for arm/arm64

Signed-off-by: Rongrong <i@rong.moe>

* chore: fix typo

Signed-off-by: Rongrong <i@rong.moe>

* chore(CI/test): using build matrix

Signed-off-by: Rongrong <i@rong.moe>

* docs(install): fix punctuation

Signed-off-by: Rongrong <i@rong.moe>
2022-05-01 21:00:29 +08:00
Tony
34b58ebc64 fix(utils): request without hostname (#9649) 2022-04-28 19:35:31 +08:00
任平生
9d9926d0bf fix(utils): 修复抓取微信已删除文章时遇到的报错 (#9589)
* fix(utils): 支持微信公众号单图片文章抓取

* fix(utils): 支持输出微信公众号转载文章阅读原文链接

* fix(utils): 修复抓取微信已删除文章时遇到的报错

* refactor: migrate to v2

Co-authored-by: blankyu(于海洋) <blankyu@tencent.com>
2022-04-21 21:40:16 +08:00
dependabot[bot]
dd4a216648 chore(deps): bump socks-proxy-agent from 6.1.1 to 6.2.0 (#9572)
* chore(deps): bump socks-proxy-agent from 6.1.1 to 6.2.0

Bumps [socks-proxy-agent](https://github.com/TooTallNate/node-socks-proxy-agent) from 6.1.1 to 6.2.0.
- [Release notes](https://github.com/TooTallNate/node-socks-proxy-agent/releases)
- [Changelog](https://github.com/TooTallNate/node-socks-proxy-agent/blob/master/CHANGELOG.md)
- [Commits](https://github.com/TooTallNate/node-socks-proxy-agent/compare/v6.1.1...v6.2.0)

---
updated-dependencies:
- dependency-name: socks-proxy-agent
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* fix: use dot notation

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: TonyRL <TonyRL@users.noreply.github.com>
2022-04-21 03:54:53 +08:00
Rongrong
0522c63d5f fix(core/utils): invalid request header Server (#9582)
Signed-off-by: Rongrong <15956627+Rongronggg9@users.noreply.github.com>
2022-04-20 22:32:46 +08:00
任平生
f3e069d399 fix(utils): 支持微信公众号单图片文章抓取; 增加输出阅读原文链接 (#9557)
* fix(utils): 支持微信公众号单图片文章抓取

* fix(utils): 支持输出微信公众号转载文章阅读原文链接
2022-04-19 01:29:45 +08:00
Levi Zim
958be6266e feat(route): 山东大学(威海)新闻网 (#9537)
* feat(sduwh): add extractors.

* feat(route): add route for 山东大学(威海)新闻网

* docs: for route sduwh/news

* docs: for route sduwh/news

(cherry picked from commit 831830167a)

* feat(radar): for route 山东大学(威海)新闻网

* refactor: change `got.get` to `got`.

* refactor: prefer `parseDate()` to `new Date()`

Co-authored-by: Tony <TonyRL@users.noreply.github.com>

* fix: incomplete URL substring sanitization.

Make CodeQL happy.

* fix(radar): fix target field.

* fix: change route /sduwh to /sdu/wh

* fix: remove superfluous slash character in url.

* feat: look for exact date first.

* feat: extract exact date from news extractor.

* feat: extract exact date from view extractor.

* feat: extractor for www.sdrj.sdu.edu.cn

* refactor: semantic separation of sduwh with sdu

* feat(radar): more accurate name

* docs: update documentation

* refactor: migrate to v2

* refactor: fix deprecated url.resolve

* fix: update docs url

Co-authored-by: Tony <TonyRL@users.noreply.github.com>

* fix: sdu not working routes

* fix: accurate `ctx.state.data.url`

Co-authored-by: Tony <TonyRL@users.noreply.github.com>

* fix: better error handling for extractors.

* fix: timezone

Co-authored-by: Tony <TonyRL@users.noreply.github.com>

* fix: better error handling.

Co-authored-by: Tony <TonyRL@users.noreply.github.com>
2022-04-17 00:01:39 +08:00
任平生
eb467afae1 fix(utils): 支持将微信公众号转载文章的正文抓取回来 (#9534)
* feat: 修正日期时间匹配规则、移除一些不必要评论元素

* fix(route)(fortunechina): 修正财富中文网1、双语文章中文内容重复问题;2、移除 kol 大头像

* fix(route)(wechat): 支持将微信公众号转载文章的正文抓取回来
2022-04-15 19:23:09 +08:00
Tony
bafb3534e1 feat(utils): random user agent (#9449)
* feat(utils): random ua

* chore: bump rand-user-agent to 1.0.58(no more deps)
2022-04-12 17:51:07 +08:00
Ethan Shen
a25fd4b67f fix(utils): wrong date with same weekday in parse-date (#9506) 2022-04-10 23:40:35 +08:00
Chenxing Luo
ef94bcde8e feat(route): add two blogs: Stratechery & Miris Whispers (#9496)
* Add two blogs

* fix(utils): add parseDate timezone common-config

* Update lib/v2/stratechery/index.js

Co-authored-by: Tony <TonyRL@users.noreply.github.com>

* Update docs/blog.md

Co-authored-by: Tony <TonyRL@users.noreply.github.com>

* Update docs/en/blog.md

Co-authored-by: Tony <TonyRL@users.noreply.github.com>

* Update lib/v2/miris/blog.js
2022-04-08 23:48:28 +08:00
Rongrong
74e1f88a32 feat(core)(utils/wechat-mp): normalize URL (#9497)
Signed-off-by: Rongrong <15956627+Rongronggg9@users.noreply.github.com>
2022-04-08 18:42:52 +08:00
Rongrong
a79cc20ec1 feat(utils): add utils for WeChat MP (#9487)
Motivation:
There are multiple routes that need to fetch articles from WeChat MP.
However, letting them fetch articles by themselves could potentially
lead to cache key collisions. Even if cache key collisions do not occur,
un-normalized URL could potentially lead to duplicated requests.
What's more, articles from WeChat MP have weird formats and need to be
fixed. Creating a universal function to do this work can create some
ease for new route contributors.

Note:
In order to make this PR atomic as much as possible, I did not touch
those broken routes. Once this PR is merged, I will try to fix them.

Signed-off-by: Rongrong <15956627+Rongronggg9@users.noreply.github.com>
2022-04-07 21:46:15 +08:00
Rongrong
5b35471e39 fix(core): offending RFC4287 (#9441)
* fix(core): offending RFC4287

should not leave `<updated>` blank when `<published>` is not blank
these two fields MUST conform to the "date-time" production in RFC3339

Signed-off-by: Rongrong <15956627+Rongronggg9@users.noreply.github.com>

* test(common-utils): complete tests

Signed-off-by: Rongrong <15956627+Rongronggg9@users.noreply.github.com>

* test(template): restrict expected value of `pubDate`

Signed-off-by: Rongrong <15956627+Rongronggg9@users.noreply.github.com>
2022-04-02 17:44:45 +08:00
Rongrong
95db3b4e99 fix(core): torrent searching error (#9407)
Signed-off-by: Rongrong <15956627+Rongronggg9@users.noreply.github.com>
2022-03-29 21:12:16 +08:00
Tony
cd151f45ab fix(utils): parseDate custom format not working 2022-03-28 22:25:40 +08:00
Ethan Shen
a3396029ec fix(utils): wrong last weekday for relative date (#9397) 2022-03-27 23:39:44 +08:00
Ethan Shen
e42bdf3d97 fix(utils): typo in parse-date (#9391)
* fix(utils): typo in parse-date

* fix: add `后日`
2022-03-26 19:44:27 +08:00
Ethan Shen
d5648b37ee fix(utils): parse relative dates with multiple time units (#9365)
* fix(utils): parse relative dates with multiple time units

* docs: remove warning

* fix: add more characters to match

* fix: rename to parse-date
2022-03-26 16:45:42 +08:00
Rongrong
c48ca6bd5b fix(core): invalid feed fields (#9286)
Signed-off-by: Rongrong <15956627+Rongronggg9@users.noreply.github.com>
2022-03-22 02:13:15 +08:00
Tony
2c2bf4ed09 fix: typo in puppeteer options 2022-03-06 23:42:32 +08:00
DIYgod
35c9834049 fix: puppeteer userDataDir 2021-11-28 01:21:42 +00:00
NeverBehave
0792f7ba25 feat(core): first attempt to init script standard (#8224)
- lazy load
- rate limit per path
- init .debug.json support
- docs
- maintainer
- radar
2021-09-22 05:41:00 -07:00
Daniel Li (李丹阳)
d77a039f05 style(eslint): add no-implicit-coercion rule (#8175)
* refactor: add no-implicit-coercion rule for ESLint

* fix: errors from deepscan

* fix: errors from deepscan

* fix: errors from deepscan

* fix: errors from deepscan

* fix: errors from deepscan

* Update docs/en/joinus/quick-start.md

Co-authored-by: Sukka <isukkaw@gmail.com>

* Update docs/joinus/quick-start.md

Co-authored-by: Sukka <isukkaw@gmail.com>

* Update lib/routes/av01/tag.js

Co-authored-by: Sukka <isukkaw@gmail.com>

* Update lib/routes/gov/taiwan/mnd.js

Co-authored-by: Sukka <isukkaw@gmail.com>

* Update lib/routes/ps/product.js

Co-authored-by: Sukka <isukkaw@gmail.com>

* refactor: minify html string

Co-authored-by: Sukka <isukkaw@gmail.com>
2021-09-15 21:22:11 +08:00
Sukka
31720bbb1b perf: lazy require dependencies (#8025) 2021-08-20 14:05:57 +08:00
Sukka
d82847f541 style/chore(eslint): enforce new rules (#8040)
* style: prefer object shorthand syntax
* refactor: prefer Array#map over Array#forEach
* style: prefer arrow callback
* chore(eslint): update rules
* style: auto fix by eslint
2021-08-17 22:23:23 +08:00
Sukka
6e3b58ed1d refactor: avoid promise overhead (#8028) 2021-08-16 11:45:53 -07:00
Chih-Hsuan Yen
1c9c4ccfc8 fix(core): make sure timeout error messages include URLs (#7981)
Before this fix, timeout messages are not quite useful

> error: Request undefined fail, retry attempt #1: TimeoutError: Timeout awaiting 'request' for 5000ms
2021-08-12 01:27:07 -07:00
Queensferry
65e74a1c5e chore(utils): parse-date supports relative time & fix routes (#7530) 2021-05-14 23:08:33 -04:00
GitHub Action
e1b3b5d877 style: auto format 2021-05-08 21:49:05 +00:00
Queensferry
10f5bb7bce refactor: timezone conversion in lib/utils/date.js (#7438) 2021-05-08 17:45:37 -04:00
DIYgod
89e82d88fa feat: got request timeout 2021-02-01 20:06:49 +08:00
DIYgod
c5e3a27f44 feat: auto add headers.host 2020-12-17 18:54:29 +08:00
Herb Brewer
d4bdf8c7e8 feat: add 豆瓣用户想看 (#6285) 2020-12-04 15:44:33 +00:00
Shun Zi
3442ca9196 feat: more readable twitter tweet (#6051) 2020-10-30 09:09:10 +00:00
Henry
ec49562269 Revert #5271
Incomplete PR. #5261
2020-07-29 17:25:41 +01:00
sabuaka18
eddba23099 feat: javbus routes: director label studio (#5271)
Co-authored-by: zrenca <42361841+zrenca@users.noreply.github.com>
2020-07-28 15:59:30 +01:00
hoilc
90030ec017 fix anitama date (#5255) 2020-07-28 05:54:43 +01:00