【OCR学习笔记】5、OCR传统特征提取方法（文末附python源码实现下载）

        点击上方

AI人工智能初学者，订阅我！此刻开始我们一起学习进步！ 1 简介

虽然现在传统的特征提取方法看见的越来越少，但是实际工业界还是会使用传统的特征提取方法；传统的特征提取方法主要分为两个类别，分别是基于结构形态的特征提取方法 与基于几何分布的特征提取方法 。

更为详细的划分如下图所示：

picture.image

2 基于结构形态的特征提取方法

通常情况下，基于结构形态的特征有两类表示方法：1）轮廓特征 、2）区域特征 。
基于结构形态的特征提取方法主要是将字符图像的结构形态转化为特征向量 。
主要包括:

边界特征法
傅里叶特征算子法
形状不变矩法
几何参数法 。

2.1 边界特征法

边界特征法主要关注的是图像边界部分的特征。其中，霍夫变换法 和边界方向直方图法 是两种最典型的边界特征法。

2.1.1 霍夫变换法

举个简单的例子来说明霍夫变换法，在图片上，画一条直线，直线方程为。在这条直线上，选取三个点：、、。根据上述原理，可以求出过点的参数方程为，过点的参数方程为，过点的参数方程为，这三个参数方程对应着三条不同的直线。而这三条直线相交于同一点。同理，原图像上的其他点如、对应的参数方程和也都经过点。这个特性说明，可以通过将图像平面上的点映射到参数平面上的线，然后利用统计特征来求出直线所在的位置。即如果图像平面上有两条直线，那么最终的参数平面上就会出现两个峰值点，以此类推。

霍夫变换的基本思想 ：原始图像坐标系下的一个点对应于参数坐标系中的一条直线，反之，参数坐标系下的一条直线对应于原始图像坐标系下的一个点。然后，将原始图像坐标系下的各个点都投影到参数坐标系之后，会发现有聚集的点，这些聚集的点组成了原始坐标系下的直线。

霍夫直线检测对应的Python实现如下：


        
          
# -*- coding: utf-8 -*-  
import sys  
import numpy as np  
import cv2  
import math  
import matplotlib.pyplot as plt  
from mpl_toolkits.mplot3d import Axes3D  
  
  
# 霍夫极坐标变换：直线检测  
def HTLine(image, stepTheta=1, stepRho=1):  
    # 宽、高  
    rows, cols = image.shape  
    # 图像中可能出现的最大垂线的长度  
    L = round(math.sqrt(pow(rows - 1, 2.0) + pow(cols - 1, 2.0))) + 1  
    # 初始化投票器  
    numtheta = int(180.0 / stepTheta)  
    numRho = int(2 * L / stepRho + 1)  
    accumulator = np.zeros((numRho, numtheta), np.int32)  
    # 建立字典  
    accuDict = {}  
    for k1 in range(numRho):  
        for k2 in range(numtheta):  
            accuDict[(k1, k2)] = []  
    # 投票计数  
    for y in range(rows):  
        for x in range(cols):  
            if image[y][x] == 255:  # 只对边缘点做霍夫变换  
                for m in range(numtheta):  
                    # 对每一个角度，计算对应的 rho 值  
                    rho = x * math.cos(stepTheta * m / 180.0 * math.pi) + y * math.sin(stepTheta * m / 180.0 * math.pi)  
                    # 计算投票哪一个区域  
                    n = int(round(rho + L) / stepRho)  
                    # 投票加 1  
                    accumulator[n, m] += 1  
                    # 记录该点  
                    accuDict[(n, m)].append((x, y))  
    return accumulator, accuDict  
  
# 主函数  
if __name__ == "\_\_main\_\_":  
    I = cv2.imread('../picture/line.png')  
    # canny 边缘检测  
    edge = cv2.Canny(I, 50, 200)  
    # 显示二值化边缘  
    cv2.imshow("edge", edge)  
    # 霍夫直线检测  
    accumulator, accuDict = HTLine(edge, 1, 1)  
    # 计数器的二维直方图方式显示  
    rows, cols = accumulator.shape  
    fig = plt.figure()  
    ax = fig.gca(projection='3d')  
    X, Y = np.mgrid[0:rows:1, 0:cols:1]  
    surf = ax.plot_wireframe(X, Y, accumulator, cstride=1, rstride=1, color='gray')  
    ax.set_xlabel(u"$\\rho$")  
    ax.set_ylabel(u"$\\theta$")  
    ax.set_zlabel("accumulator")  
    ax.set_zlim3d(0, np.max(accumulator))  
    # 计数器的灰度级显示  
    grayAccu = accumulator / float(np.max(accumulator))  
    grayAccu = 255 * grayAccu  
    grayAccu = grayAccu.astype(np.uint8)  
    # 只画出投票数大于 60 直线  
    voteThresh = 180  
    for r in range(rows):  
        for c in range(cols):  
            if accumulator[r][c] > voteThresh:  
                points = accuDict[(r, c)]  
                cv2.line(I, points[0], points[len(points) - 1], (255), 2)  
    cv2.imshow('accumulator', grayAccu)  
    # 显示原图  
    cv2.imshow("I", I)  
    plt.show()  
    cv2.imwrite('accumulator.jpg', grayAccu)  
    cv2.imwrite('I.jpg', I)  
    cv2.waitKey(0)  
    cv2.destroyAllWindows()

picture.image

原图

picture.image

检测到的直线

picture.image

计数器的三维展示

2.1.2 边界方向直方图法

首先讨论一下图像边缘检测。常用的边缘检测算子有：

Laplacian算子
Sobel算子
Prewitt算子
Canny算子
等等

一幅图像是由很多个离散的像素点组成的，上面提到的这些算子将通过差分的方式来近似偏导数的值。其中，Canny算子是效果较好的一种图像边缘检测算子。它分为两个阶段，首先对图像进行高斯平滑，然后对平滑之后的图像进行Roberts算子运算。

1、Laplacian算子
Laplacian算子是n维欧几里得空间中的一个二阶微分算子。

picture.image 2、Sobel算子
Sobel算子是一种一阶微分算子，主要利用单个像素邻近区域的梯度值来计算该像素的梯度值，然后根据一定的规则进行取舍：

picture.image 如果Sobel算子使用的是一个

方向3×3的卷积核，并且该卷积核对 垂直边缘 的响应最大；如果Sobel算子使用的是一个

方向3×3的卷积核，并且该卷积核对 水平边缘 的响应最大。

3、Prewitt算子
Prewitt算子也是一种一阶微分算子：

picture.image 如果Prewitt算子使用的是一个方向3×3的卷积核，并且该卷积核对垂直边缘 的响应最大；如果Sobel算子使用的是一个方向3×3的卷积核，并且该卷积核对水平边缘 的响应最大。

4、Canny算子
Canny边缘检测算子主要包括以下四个步骤。

1 用高斯滤波器对图像进行平滑处理。
2 用一阶偏导的有限差分来计算梯度的幅值和方向。
3 对梯度的幅值进行非极大值抑制处理。
4 用双阈值算法检测和连接图像的边缘。

Canny检测对应的Python实现如下（包含canny和sobel的python实现）：


        
          
# -*- coding: utf-8 -*-  
import numpy as np  
import sys  
import math  
import cv2  
# import sobel  # 注意sobel边缘检测  
# import cv2.Sobel as sobel  
from cv2 import Sobel as sobel  
  
  
# 边缘检测  
# 非极大值抑制  
def non\_maximum\_suppression\_default(dx, dy):  
    # 边缘强度  
    edgeMag = np.sqrt(np.power(dx, 2.0) + np.power(dy, 2.0))  
    # 宽、高  
    rows, cols = dx.shape  
    # 梯度方向  
    gradientDirection = np.zeros(dx.shape)  
    # 边缘强度非极大值抑制  
    edgeMag_nonMaxSup = np.zeros(dx.shape)  
    for r in range(1, rows - 1):  
        for c in range(1, cols - 1):  
            # angle 的范围 [0,180] [-180,0]  
            angle = math.atan2(dy[r][c], dx[r][c]) / math.pi * 180  
            gradientDirection[r][c] = angle  
            # 左 / 右方向  
            if (abs(angle) < 22.5 or abs(angle) > 157.5):  
                if (edgeMag[r][c] > edgeMag[r][c - 1] and edgeMag[r][c] > edgeMag[r][c + 1]):  
                    edgeMag_nonMaxSup[r][c] = edgeMag[r][c]  
            # 左上 / 右下方向  
            if (angle >= 22.5 and angle < 67.5 or (-angle > 112.5 and -angle <= 157.5)):  
                if (edgeMag[r][c] > edgeMag[r - 1][c - 1] and edgeMag[r][c] > edgeMag[r + 1][c + 1]):  
                    edgeMag_nonMaxSup[r][c] = edgeMag[r][c]  
            # 上 / 下方向  
            if ((angle >= 67.5 and angle <= 112.5) or (angle >= -112.5 and angle <= -67.5)):  
                if (edgeMag[r][c] > edgeMag[r - 1][c] and edgeMag[r][c] > edgeMag[r + 1][c]):  
                    edgeMag_nonMaxSup[r][c] = edgeMag[r][c]  
            # 右上 / 左下方向  
            if ((angle > 112.5 and angle <= 157.5) or (-angle >= 22.5 and -angle < 67.5)):  
                if (edgeMag[r][c] > edgeMag[r - 1][c + 1] and edgeMag[r][c] > edgeMag[r + 1][c - 1]):  
                    edgeMag_nonMaxSup[r][c] = edgeMag[r][c]  
    return edgeMag_nonMaxSup  
  
  
# 非极大值抑制：插值比较  
def non\_maximum\_suppression\_Inter(dx, dy):  
    # 边缘强度  
    edgeMag = np.sqrt(np.power(dx, 2.0) + np.power(dy, 2.0))  
    # 宽、高  
    rows, cols = dx.shape  
    # 梯度方向  
    gradientDirection = np.zeros(dx.shape)  
    # 边缘强度的非极大值抑制  
    edgeMag_nonMaxSup = np.zeros(dx.shape)  
    for r in range(1, rows - 1):  
        for c in range(1, cols - 1):  
            if dy[r][c] == 0 and dx[r][c] == 0:  
                continue  
            # angle的范围 [0,180],[-180,0]  
            angle = math.atan2(dy[r][c], dx[r][c]) / math.pi * 180  
            gradientDirection[r][c] = angle  
            # 左上方和上方的插值 右下方和下方的插值  
            if (angle > 45 and angle <= 90) or (angle > -135 and angle <= -90):  
                ratio = dx[r][c] / dy[r][c]  
                leftTop_top = ratio * edgeMag[r - 1][c - 1] + (1 - ratio) * edgeMag[r - 1][c]  
                rightBottom_bottom = (1 - ratio) * edgeMag[r + 1][c] + ratio * edgeMag[r + 1][c + 1]  
                if edgeMag[r][c] > leftTop_top and edgeMag[r][c] > rightBottom_bottom:  
                    edgeMag_nonMaxSup[r][c] = edgeMag[r][c]  
            # 右上方和上方的插值 左下方和下方的插值  
            if (angle > 90 and angle <= 135) or (angle > -90 and angle <= -45):  
                ratio = abs(dx[r][c] / dy[r][c])  
                rightTop_top = ratio * edgeMag[r - 1][c + 1] + (1 - ratio) * edgeMag[r - 1][c]  
                leftBottom_bottom = ratio * edgeMag[r + 1][c - 1] + (1 - ratio) * edgeMag[r + 1][c]  
                if edgeMag[r][c] > rightTop_top and edgeMag[r][c] > leftBottom_bottom:  
                    edgeMag_nonMaxSup[r][c] = edgeMag[r][c]  
            # 左上方和左方的插值 右下方和右方的插值  
            if (angle >= 0 and angle <= 45) or (angle > -180 and angle <= -135):  
                ratio = dy[r][c] / dx[r][c]  
                rightBottom_right = ratio * edgeMag[r + 1][c + 1] + (1 - ratio) * edgeMag[r][c + 1]  
                leftTop_left = ratio * edgeMag[r - 1][c - 1] + (1 - ratio) * edgeMag[r][c - 1]  
                if edgeMag[r][c] > rightBottom_right and edgeMag[r][c] > leftTop_left:  
                    edgeMag_nonMaxSup[r][c] = edgeMag[r][c]  
            # 右上方和右方的插值 左下方和左方的插值  
            if (angle > 135 and angle <= 180) or (angle > -45 and angle <= 0):  
                ratio = abs(dy[r][c] / dx[r][c])  
                rightTop_right = ratio * edgeMag[r - 1][c + 1] + (1 - ratio) * edgeMag[r][c + 1]  
                leftBottom_left = ratio * edgeMag[r + 1][c - 1] + (1 - ratio) * edgeMag[r][c - 1]  
                if edgeMag[r][c] > rightTop_right and edgeMag[r][c] > leftBottom_left:  
                    edgeMag_nonMaxSup[r][c] = edgeMag[r][c]  
    return edgeMag_nonMaxSup  
  
  
# 判断一个点的坐标是否在图像范围内  
def checkInRange(r, c, rows, cols):  
    if r >= 0 and r < rows and c >= 0 and c < cols:  
        return True  
    else:  
        return False  
  
  
def trace(edgeMag\_nonMaxSup, edge, lowerThresh, r, c, rows, cols):  
    # 大于阈值为确定边缘点  
    if edge[r][c] == 0:  
        edge[r][c] = 255  
        for i in range(-1, 2):  
            for j in range(-1, 2):  
                if checkInRange(r + i, c + j, rows, cols) and edgeMag_nonMaxSup[r + i][c + j] >= lowerThresh:  
                    trace(edgeMag_nonMaxSup, edge, lowerThresh, r + i, c + j, rows, cols)  
  
  
# 滞后阈值  
def hysteresisThreshold(edge\_nonMaxSup, lowerThresh, upperThresh):  
    # 宽高  
    rows, cols = edge_nonMaxSup.shape  
    edge = np.zeros(edge_nonMaxSup.shape, np.uint8)  
    for r in range(1, rows - 1):  
        for c in range(1, cols - 1):  
            # 大于高阈值，设置为确定边缘点，而且以该点为起始点延长边缘  
            if edge_nonMaxSup[r][c] >= upperThresh:  
                trace(edgeMag_nonMaxSup, edge, lowerThresh, r, c, rows, cols)  
            # 小于低阈值，被剔除  
            if edge_nonMaxSup[r][c] < lowerThresh:  
                edge[r][c] = 0  
    return edge  
  
  
# 主函数  
if __name__ == "\_\_main\_\_":  
    image = cv2.imread("../picture/house.png")  
    image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)  
    # ------- canny 边缘检测 -----------  
    # 第一步： 基于 sobel 核的卷积  
    image_sobel_x = sobel(image, cv2.CV_64F, 1, 0)  
    image_sobel_y = sobel(image, cv2.CV_64F, 0, 1)  
    # 边缘强度：两个卷积结果对应位置的平方和  
    edge = np.sqrt(np.power(image_sobel_x, 2.0) + np.power(image_sobel_y, 2.0))  
    # 边缘强度的灰度级显示  
    edge[edge > 255] = 255  
    edge = edge.astype(np.uint8)  
    cv2.imshow("sobel edge", edge)  
    # 第二步：非极大值抑制  
    edgeMag_nonMaxSup = non_maximum_suppression_default(image_sobel_x, image_sobel_y)  
    edgeMag_nonMaxSup[edgeMag_nonMaxSup > 255] = 255  
    edgeMag_nonMaxSup = edgeMag_nonMaxSup.astype(np.uint8)  
    cv2.imshow("edgeMag\_nonMaxSup", edgeMag_nonMaxSup)  
    # 第三步：双阈值滞后阈值处理，得到 canny 边缘  
    # 滞后阈值的目的就是最后决定处于高阈值和低阈值之间的是否为边缘点  
    edge = hysteresisThreshold(edgeMag_nonMaxSup, 60, 180)  
    lowerThresh = 40  
    upperThresh = 150  
    cv2.imshow("canny", edge)  
    cv2.imwrite("canny.jpg", edge)  
    # -------以下是为了单阈值与滞后阈值的结果比较 ------  
    # 大于高阈值 设置为白色 为确定边缘  
    EDGE = 255  
    # 小于低阈值的设置为黑色 表示不是边缘，被剔除  
    NOEDGE = 0  
    # 而大于等于低阈值 小于高阈值的设置为灰色，标记为可能的边缘  
    POSSIBLE_EDGE = 128  
    tempEdge = np.copy(edgeMag_nonMaxSup)  
    rows, cols = tempEdge.shape  
    for r in range(rows):  
        for c in range(cols):  
            if tempEdge[r][c] >= upperThresh:  
                tempEdge[r][c] = EDGE  
            elif tempEdge[r][c] < lowerThresh:  
                tempEdge[r][c] = NOEDGE  
            else:  
                tempEdge[r][c] = POSSIBLE_EDGE  
    cv2.imshow("tempEdge", tempEdge)  
    lowEdge = np.copy(edgeMag_nonMaxSup)  
    lowEdge[lowEdge > 60] = 255  
    lowEdge[lowEdge < 60] = 0  
    cv2.imshow("lowEdge", lowEdge)  
    upperEdge = np.copy(edgeMag_nonMaxSup)  
    upperEdge[upperEdge > 180] = 255  
    upperEdge[upperEdge <= 180] = 0  
    cv2.imshow("upperEdge", upperEdge)  
    cv2.waitKey(0)  
    cv2.destroyAllWindows()

picture.image

原图

picture.image

边缘强度图

picture.image

检测结果图

2.2 傅里叶特征算子

傅里叶特征算子 ，又称傅里叶形状描述子 ，主要作用 是通过对目标边界的轮廓进行离散傅里叶变换得到目标边界形状的定量表达。可以将图像的信号从时域转换到频域。

傅里叶形状描述子

当确定了图像中的目标区域的起始点以及方向之后，可以利用一系列的坐标对来描述边界的信息了。假设边界上有个边界点，起始点为，按照顺时针方向可以表示为一个坐标序列：

picture.image

一般来说，如果将目标边界看成是从某一个点出发，则沿着该边界顺时针旋转一周的周边长可以用一个复函数来表示。换句话说就是，边界上点的坐标可以用如下复数来表示：

picture.image 通过这种方式，可以成功地将坐标序列的二维表示转换为一维表示。对于复数，可以用一个一维离散傅里叶变换系数来表示：

picture.image 这里的是图像边界的傅里叶描述子。同理，如果对进行傅里叶反变换，则可以得到最开始的坐标序列的表达式（仅选取前L个傅里叶变换系数近似）：

picture.image 低阶系数表示的是边界的大致形状 ，高阶系数表示的是边界的细节特征 。傅里叶描述子在描述边界时，对旋转、平移、尺度变化等均不敏感 。


        
          
# -*- coding: utf-8 -*-  
import sys  
import numpy as np  
import cv2  
import math  
  
  
# 快速傅里叶变换  
def fft2Image(src):  
    # 得到行、列  
    r, c = src.shape[:2]  
    # 得到快速傅里叶变换最优扩充  
    rPadded = cv2.getOptimalDFTSize(r)  
    cPadded = cv2.getOptimalDFTSize(c)  
    # 边缘扩充，下边缘和右边缘扩充值为零  
    fft2 = np.zeros((rPadded, cPadded, 2), np.float32)  
    fft2[:r, :c, 0] = src  
    # 快速傅里叶变换  
    cv2.dft(fft2, fft2, cv2.DFT_COMPLEX_OUTPUT)  
    return fft2  
  
  
# 傅里叶幅度谱  
def amplitudeSpectrum(fft2):  
    # 求幅度  
    real2 = np.power(fft2[:, :, 0], 2.0)  
    Imag2 = np.power(fft2[:, :, 1], 2.0)  
    amplitude = np.sqrt(real2 + Imag2)  
    return amplitude  
  
  
# 幅度谱的灰度级显示  
def graySpectrum(amplitude):  
    # 对比度拉伸  
    # cv2.log(amplitude+1.0,amplitude)  
    amplitude = np.log(amplitude + 1.0)  
    # 归一化,傅里叶谱的灰度级显示  
    spectrum = np.zeros(amplitude.shape, np.float32)  
    cv2.normalize(amplitude, spectrum, 0, 1, cv2.NORM_MINMAX)  
    return spectrum  
  
  
# 相位谱  
def phaseSpectrum(fft2):  
    # 得到行数、列数  
    rows, cols = fft2.shape[:2]  
    # 计算相位角  
    phase = np.arctan2(fft2[:, :, 1], fft2[:, :, 0])  
    # 相位角转换为 [ -180 , 180]  
    spectrum = phase / math.pi * 180  
    return spectrum  
  
  
# 主函数  
if __name__ == "\_\_main\_\_":  
    image = cv2.imread('../picture/fuliye.png', 0)  
    # 显示原图  
    cv2.imshow("image", image)  
    # cv2.imwrite("img1.jpg",image)  
    # 快速傅里叶变换  
    fft2 = fft2Image(image)  
    # 求幅度谱  
    amplitude = amplitudeSpectrum(fft2)  
    amc = np.copy(amplitude)  
    amc[amc > 255] = 255  
    amc = amc.astype(np.uint8)  
    # cv2.imshow("originam",amc)  
    # cv2.imwrite("orAmp.jpg",amc)  
    # 幅度谱的灰度级显示  
    ampSpectrum = graySpectrum(amplitude)  
    ampSpectrum *= 255  
    ampSpectrum = ampSpectrum.astype(np.uint8)  
    cv2.imshow("amplitudeSpectrum", ampSpectrum)  
    # cv2.imwrite("ampSpectrum.jpg",ampSpectrum)  
    # 相位谱的灰度级显示  
    phaseSpe = phaseSpectrum(fft2)  
    cv2.imshow("phaseSpectrum", phaseSpe)  
    # cv2.imwrite("phaseSpe.jpg",phaseSpe)  
    '''  
    傅里叶幅度谱的中心化  
    '''  
    # 第一步：图像乘以(-1)^(r+c)  
    rows, cols = image.shape  
    fimg = np.copy(image)  
    fimg = fimg.astype(np.float32)  
    for r in range(rows):  
        for c in range(cols):  
            if (r + c) % 2:  
                fimg[r][c] = -1 * image[r][c]  
            else:  
                fimg[r][c] = image[r][c]  
    # 第二步：快速傅里叶变换  
    imgfft2 = fft2Image(fimg)  
    # 第三步：傅里叶的幅度谱  
    amSpe = amplitudeSpectrum(imgfft2)  
    # 幅度谱的灰度级显示  
    graySpe = graySpectrum(amSpe)  
    cv2.imshow("amSpe", graySpe)  
    graySpe *= 255  
    graySpe = graySpe.astype(np.uint8)  
    cv2.imwrite("centerAmp.jpg", graySpe)  
    # 第四步：相位谱的灰度级显示  
    phSpe = phaseSpectrum(imgfft2)  
    cv2.imshow("phSpe", phSpe)  
    # cv2.imwrite("centerphSpe.jpg",phSpe)  
    cv2.waitKey(0)  
    cv2.destroyAllWindows()