自动OCR:Meme of LinOnetwo — 林一二的模因和想法

自动OCR

2020年04月15日 23:43

Ref: apple.stackexchange
Keywords:

brew install tesseract

到 tesseract-ocr/tessdata 下载中文识别数据，放到 /usr/local/share/tessdata/

然后可以拿一个截图试试，输出到 stdout，并去掉空格（因为中文识别出来空格一堆）：

tesseract ~/Desktop/aaa.png stdout -l chi_sim | tr -d "[:space:]"

但是如果要识别英文的话就不好去掉空格了：

tesseract ~/Desktop/aaa.png stdout -l chi_sim+eng

用 Automator 创建一个名为 OCRScreenshot 的 Folder Action，选择监听 Desktop，然后创建一个 Run Shell Script，注意 Pass Input as argument。脚本参考了自动压缩截图。

for f in "$@"
do
  if [[ $(file --mime-type -b "$f") == image/*g ]]; then
	  /usr/local/bin/tesseract "$f" stdout -l chi_sim+eng | pbcopy
  fi
done

注意要用 /usr/local/bin/tesseract 而不是 tesseract，不然会说 zsh:4: command not found: tesseract。