Graph Network for Object Detection



I consider a possibility of object detection using a graph neural network.

It is to represent correlation among pixels in basic, by the graph.

(M x N)-pixel image can be represented by (M x N) x (M x N) pixel correlation matrix just like a complete network topology. Local and high intensity (255 for 8-bit monochrome for example) between pixels should have strong correlation something, so the element on the matrix should have higher value. Global and low intensity between pixels should have weak correlation, so the element on the matrix should have lower value.

For simplify the problem, let us consider a monochrome image (a single channel) which has pixels having 0 to 255 value.Then the correlation matrix can be obtained by following procedure;

  1. an edge filter is applied, or a convolution is applied to the source image
  2. evaluate distance between pixels and intensities of the pixels
  3. fill the correlation value to the matrix
  4. Particular object should have particular pattern in the matrix as a feature.

Approximately similar pattern is same object, indeed.
Although graph network has correlation between neighbor nodes, in this case, a node affects onto all other nodes. Computation complexity is very high, so sparse matrix representation (CSR for example) is required to skip unnecessary operations.

Just checking correlation of pixels, so elements in matrix should be independent from a geometry. So, this should have robustness for rotation, striding, lack of object part, instead of use the pooling.

By obtaining the matrix thru learning, it can detect object, maybe.
This is probably not a graph network, but I want to explore possibility of this method.

Any suggestion is welcome.
And I am going to attend 19th this month of Study event.



I found a paper having a similar idea being published at NeurIPS 2018 [1]. Main idea is it applies a few CNN layers, groups pixels in the feature maps into vertices of a graph. Edges are created based on some distance measure between vertices of this graph. Then an GNN (ie: GCN) is run on this graph to get a new feature representation. The resultant graph is mapped back to feature maps.

[1] “Beyond Grids: Learning Graph Representations for Visual Recognition”, Yin Li and Abhinav Gupta


I just remembered this paper that I stumbled upon a while ago. I haven’t read it completely but thought you might be interested. It was published in ICLR 2017.

Learning Graphical State Transitions



Very sorry for my too late replying, I did no accessed here but accessed to slack of MLT.

I check the papers, thank you very much!