Python 3.x - ローカルに保存された画像ファイルからの Azure Computer Vision API の手書き認識

okwaves2024-01-25 9

私は Azure を探索することで、コーディングとクラウドコンピューティングのスキルを磨こうとしています。大量の手書き文書を解読し、テキストを電子的に保存することを含む管理タスクの一部を自動化したいと考えています。

以下の Python コードは、2 つのコードソースをマージしたものです。

Taygan Rifat のブログ https://www.taygan.co/blog/2018/4/28/image-processing-with-cognitive-services

Microsoft 独自のデモコードは https://learn.microsoft.com/en-us/azure/cognitive-services/computer-vision/quickstarts/python-hand-text にあります

import json
import os
import sys
import requests
import time
import matplotlib.pyplot as plt
from matplotlib.patches import Polygon
from PIL import Image
from io import BytesIO

subscription_key = 'XX79fdc005d542XXXb5f29ce04ab1cXXX'
endpoint = 'https://handwritng.cognitiveservices.azure.com/'
analyze_url = endpoint + "vision/v3.0/analyze"
text_recognition_url = endpoint + "/vision/v3.0/read/analyze"

image_url = "https://3j2w6t1pktei3iwq0u47sym8-wpengine.netdna-ssl.com/wp-content/uploads/2014/08/Handwriting-sample-Katie.png"

headers = {'Ocp-Apim-Subscription-Key': subscription_key}
data = {'url': image_url}
response = requests.post(
    text_recognition_url, headers=headers, json=data)
response.raise_for_status()

# Extracting text requires two API calls: One call to submit the
# image for processing, the other to retrieve the text found in the image.

# Holds the URI used to retrieve the recognized text.
operation_url = response.headers["Operation-Location"]

# The recognized text isn't immediately available, so poll to wait for completion.
analysis = {}
poll = True
while (poll):
    response_final = requests.get(
        response.headers["Operation-Location"], headers=headers)
    analysis = response_final.json()

    print(json.dumps(analysis, indent=4))

    time.sleep(1)
    if ("analyzeResult" in analysis):
        poll = False
    if ("status" in analysis and analysis['status'] == 'failed'):
        poll = False

polygons = []
if ("analyzeResult" in analysis):
    # Extract the recognized text, with bounding boxes.
    polygons = [(line["boundingBox"], line["text"])
                for line in analysis["analyzeResult"]["readResults"][0]["lines"]]

# Display the image and overlay it with the extracted text.
image = Image.open(BytesIO(requests.get(image_url).content))
ax = plt.imshow(image)
for polygon in polygons:
    vertices = [(polygon[0][i], polygon[0][i + 1])
                for i in range(0, len(polygon[0]), 2)]
    text = polygon[1]
    print(text)
    patch = Polygon(vertices, closed=True, fill=False, linewidth=2, color='y')
    ax.axes.add_patch(patch)
    plt.text(vertices[0][0], vertices[0][1], text, fontsize=20, va="top")


plt.show()

私がやりたいのは、(URL を使用する代わりに) ローカルに保存されている画像ファイルを処理できるようにスクリプトを変更する際の支援を得ることです。

現在、Azure virt 上で IIS サーバーをスピンアップすることでこの問題を回避しています。デュアルマシンを起動し、HTML経由で解析したい画像のURLにアクセスします。これは少し扱いにくいです (そして、私の目的にとってはやや安全ではありません)。

ありがとう、WL

------------------------

それでは、

...
# You could also read the image file name from command line
# as the first argument passed to your script:

# try:
#    input_image = sys.argv[1]
# except:
#    sys.exit('No input. Pass input image file name as first argument.')

input_image = "your_input_image.jpg"
with open(input_image, 'rb') as f:
    data = f.read()
    headers = {
        'Ocp-Apim-Subscription-Key': subscription_key,
        'Content-type': 'application/octet-stream'
    }
    response = requests.post(
        text_recognition_url, headers=headers, data=data)
    response.raise_for_status()
...

その後

# Display the image and overlay it with the extracted text.
image = Image.open(input_image)
...

画像 URL を受け入れるほとんどの Azure Cognitive Services は、生のバイトを Content-type: application/octet-stream として受け入れ、バイナリ画像データを POST ペイロードとして受け入れます。

「画像の分析」を参照してください。

サポートされている入力方法:

生の画像バイナリまたは画像 URL。

コンテンツタイプ:

入力要件:

サポートされている画像形式: JPEG、PNG、GIF、BMP。画像ファイルのサイズは 4MB 未満である必要があります。画像寸法は少なくとも 50 x 50 である必要があります。

ところで、将来のタスクのために迅速な Web サーバーが必要になった場合は、Python がサポートします。

# usage:
# python3 -m http.server [-h] [--cgi] [--bind ADDRESS]
#                        [--directory DIRECTORY] [port]

$ python3 -m http.server
Serving HTTP on :: port 8000 (http://[::]:8000/) ...

Python 3.x - ローカルに保存された画像ファイルからの Azure Computer Vision API の手書き認識

総合生活情報サイト - OKWAVES

カテゴリ一覧