Clasificación automática de anastomosis mediante redes neuronales convolucionales en video fetoscópico An Automatic Classification of Anastomosis by Convolutional Neural Networks in Fetoscopic Video Classificação automática de anastomoses usando redes neurais convolucionais em vídeo fetoscópico

Twin-twin transfusion syndrome (TTTS) is the result of uneven blood flow through placental vascular anastomosis (blood vessel connection) that link the two fetal circulations. Vascular anastomosis in the shared placenta are present in virtually all monocorionic twin pregnancies (MCs), but in only about 10% lead to twin-twin transfusion syndrome. Without intervention, the condition is often fatal for both twins. An alternative to TTTS treatment is the placental laser procedure known as fetal surgery, which consists, in a very general way, of splitting the placenta in two, by laser cauterization of blood vessels between fetuses, thus balancing blood flows. Currently fetoscopic surgery is a procedure that is performed frequently in Mexico and the appropriate classification of anastomosis is vital for this surgery, since it represents the most recommended treatment. However, die to its degree of complexity, this surgical intervention presents multiple difficulties, such as fetuses moving during the procedure, the orientation of the video used is not suitable for a more accurate analysis, the field of view generated by the fetus is very small. Therefore, it is necessary to have a tool that helps the doctor to be able to differentiate and classify anastomosis in a more appropriate way. The objective of this work is to present the development of a computational tool that contributes to the automatic classification of anastomosis within a fetoscopic video through convolutional neural networks in a way that serves as a support for the unexperienced doctor in his training stage. A DataSet (image set) was built from fetoscopic videos first unclassified, then cataloged into three categories, selected in conjunction with experts, using an computational tool specifically created for this purpose (VideoLabel). The data augmentation technique served to build artificial images from the actual ones already classified, since the number of


Abstract
Twin-twin transfusion syndrome (TTTS) is the result of uneven blood flow through placental vascular anastomosis (blood vessel connection) that link the two fetal circulations. Vascular anastomosis in the shared placenta are present in virtually all monocorionic twin pregnancies (MCs), but in only about 10% lead to twin-twin transfusion syndrome. Without intervention, the condition is often fatal for both twins. An alternative to TTTS treatment is the placental laser procedure known as fetal surgery, which consists, in a very general way, of splitting the placenta in two, by laser cauterization of blood vessels between fetuses, thus balancing blood flows.
Currently fetoscopic surgery is a procedure that is performed frequently in Mexico and the appropriate classification of anastomosis is vital for this surgery, since it represents the most recommended treatment. However, die to its degree of complexity, this surgical intervention presents multiple difficulties, such as fetuses moving during the procedure, the orientation of the video used is not suitable for a more accurate analysis, the field of view generated by the fetus is very small. Therefore, it is necessary to have a tool that helps the doctor to be able to differentiate and classify anastomosis in a more appropriate way. The objective of this work is to present the development of a computational tool that contributes to the automatic classification of anastomosis within a fetoscopic video through convolutional neural networks in a way that serves as a support for the unexperienced doctor in his training stage.
A DataSet (image set) was built from fetoscopic videos first unclassified, then cataloged into three categories, selected in conjunction with experts, using an computational tool specifically created for this purpose (VideoLabel). The data augmentation technique served to build artificial images from the actual ones already classified, since the number of Vol. 12, Núm. 22 Enero -Junio 2021, e178 tagged images were not sufficient; in the same context, the AlexNet architecture was selected to perform a learning transfer and be trained to obtain results above 90% effectiveness in the classifications made by the computational tool created. This data allows us to conclude that if we did not have a tool that would allow first-time physicians to train in the identification of anastomosis as a practice prior to fetal surgery, their training will take longer since it is done in situ in each fetal surgery. As a result of this research, the design of a software was generated with which it is feasible to automatically classify anastomosis from a fetoscopic video through a convolutional neural network with promising results as a tool to support doctors in their training stage, allowing them to carry out their training in a more agile and less timely way.

Introduction
Twin-twin transfusion syndrome (TTTS) is the result of uneven blood flow through placental vascular anastomoses (which is the connection of blood vessels) that link the two fetal circulations. These shared placental vascular anastomoses are present in virtually all monochorionic twins (MCs), but only in about 10% lead to twin-to-twin transfusion syndrome.
Net blood transfusion occurs at the expense of the so-called donor twin, which becomes hypovolemic and anemic, while the recipient twin becomes hypervolemic and polycythemic. This leads to the rapid development of a significant mismatch in amniotic fluid volume between twins that without proper intervention can be fatal for both twins (Spruijt, Lopriore, Steggerda, Slaghekke, & van Klink, 2019).
Twin-to-twin transfusion syndrome is one of the deadliest circumstances in fetal medicine today, and it remains a great challenge for obstetricians and neonatologists around Vol. 12, Núm. 22 Enero -Junio 2021, e178 the world. In this sense, the best treatment is fetoscopic laser surgery. The technique, currently in force, uses such a laser to interrupt blood flow through vascular communications in order to coagulate only the anastomotic vessels in the vascular equator that is at a certain distance from the interlaced membrane due to the discordant volumes of amniotic fluid between twins (Glennon, Shemer, Palma-Dias y Umstad, 2016).
Laser surgery for twin-to-twin transfusion syndrome can be divided into two fundamental steps: 1) endoscopic identification of the placental vascular anastomoses, and 2) laser ablation of the anastomoses. Therefore, the goal of laser surgery is to correctly identify and remove all vascular anastomoses. This raises two questions: 1) can all placental vascular anastomoses be identified? And 2) can all placental vascular anastomoses be removed? (Quintero, Kontopoulos and Chmait, 2016). Now, focusing on endoscopic detection of placental vascular anastomoses, the following question arises: how can the anastomoses be identified? Some authors suggested that anastomoses could be distinguished according to their appearance, that is, according to specific patterns, angles, with drawings intended to help in this process, that is, identification based on pattern recognition. Unfortunately, this was not possible for clinicians due to the large number of patterns that anastomoses can have (De Lia, Kuhlmann, Cruikshank & O'Bee, 1993).
From the above, the following research questions arise: can a computational tool help in the identification of anastomoses within a fetoscopic video, and how accurate can this identification be? The present work hypothesizes that a computational technique can identify anastomoses with a certain percentage of accuracy.
Distinguishing the anastomoses to perform an adequate intervention in laser surgery for twin-twin transfusion syndrome constitutes a problem for the less experienced physician, therefore, in order to meet this need, the following objectives are set: to demonstrate that a tool Computational analysis can help in the identification of anastomosis in a fetoscopic video with a certain percentage of accuracy, and create a computational tool that helps in the automatic classification of anastomoses, in such a way that it serves the novice doctor as a support tool in their training and habilitation stage.
The science of solving clinical problems, through the analysis of images generated in clinical practice, is known as medical image analysis, and its objective is to extract information efficiently to improve clinical diagnosis. This article highlights this advance, Vol. 12, Núm. 22 Enero -Junio 2021, e178 which consists of the application of machine learning techniques for the analysis of medical images. It presents, in the first section, a description of the development of the research work carried out with several sections that give clarity about the process followed in order to achieve the planned objectives, highlighting the way in which deep learning is used successfully as a tool to machine learning, where a neural network is able to learn characteristics automatically. Of the deep learning techniques, deep convolutional networks are actively used for medical image analysis. This includes application areas such as segmentation, abnormality detection, disease classification, computer-aided diagnosis, and recovery (Anwar et al., 2018). But what do you mean when you talk about convolutional neural networks?
A convolutional neural network is a type of deep learning model for processing data that has a grid model -such as images -inspired by the organization of the animal visual cortex and designed to automatically and adaptively learn spatial hierarchies of features, from low to high standard levels. A convolutional neural is a mathematical construct that is generally made up of three types of layers (or building blocks): convolution, clustering, and fully connected layers. The first two perform feature extraction, while the thirda fully connected layermaps the extracted features to the final output, such as classification.
A convolution layer plays a key role in the convolutional neural, which is made up of a stack of mathematical operations, such as convolution, a specialized type of linear operation. In digital images, pixel values are stored in a two-dimensional (two-dimensional) grid, that is, an array of numbers, and a small parameter grid called the kernel, a tunable feature extractor, is applied at each image position. , which makes convolutional neurals highly efficient for image processing, as a feature can appear anywhere in the image. As one layer feeds its output to the next layer, the extracted features can become hierarchical and progressively more complex. The process of optimizing parameters -such as cores -is called training, which is performed to minimize the difference between the outputs and the terrain truth labels through an optimization algorithm called backward propagation and gradient descent, among others. (Yamashita, Nishio, Do y Togashi, 2018

Method
This section explains the methodology that was followed to propose a way to solve the problem presented, from the collection of the input data, its treatment and transformation to the proposal of a tool that helps to solve it; Figure 2 shows the methodology followed in graphic form.

Create the data source
In the context of this work, the general process of the data origin provided were videos in MPG format, recorded during the surgical intervention to real patients when fetal surgery was applied to them; The videos were extracted directly from the fetoscope by the doctor in charge of the intervention and include the following data: date, treatment, surgeon, patient ID, date of birth and gender, in an XML or HTML file. They are generally N videos with a structure named Video_001.mpg,… Video_00N.mpg for each patient; For this research, N videos of 12 patients were accessed.
Subsequently, a program was created to join all the separate videos of each patient (12 complete videos were obtained from this process), in this way it was possible to have them in MP4 format for later use of anastomosis classification with the Videolabel tool by the doctor expert.
In addition to this, from the 12 complete videos the frames of each one were extracted and renamed with a suitable nomenclature to have an unclassified image dataset. The nomenclature was as follows: To conclude, in this phase a program was developed in MatLab, where the entire directory containing images was reviewed and they were automatically cropped in order to leave only the image itself and eliminate the excess. This activity was carried out because it works with thousands of images, which will have to be processed by the convolutional neural network. This segment removal task saves time and results in a DataSet of unclassified images.

Manual labeling
For the labeling of the image DataSet, software was developed that helps to classify the images (called VideoLabel). This classifier software was used by medical experts with knowledge of the subject; its operation, in general, consists of reading an already processed video. The software allows an expert doctor -in case of identifying an anastomosis-to pause the video, mark the item and save the type of anastomosis identified while internally the program saves said classified image. The classes of anastomosis that the VideoLabel program already has loaded are the following: normal, arterio-venous, veno-arterial, arterio-arterial, veno-venous, placental border, interfetal membrane, insertion of the donor twin cord, insertion of the cord of the receptor twin, capillary.
The output of the VideoLabel program is the storage of the classified images in a folder that make up the DataSet of images labeled by the expert. In this research work we worked with the arterio-venous, capillary and veno-arterial categories; Figure 2 shows the three types of anastomoses with which we worked: likewise, it is difficult to assemble a large image DataSet, since the time spent on manual labeling by the expert physician is limited. The next section specifies the quantities of images collected and tagged by the expert.

Data augmentation
In general, convolutional neural networks are too deep to train from scratch with small data sets (you need at least a thousand images classified by category) for training from scratch. For that, learning transfer techniques and data augmentation are recommended; In this sense, the data augmentation technique resides in artificially increasing the group of training images using various readjustments to the original images, such as modifying the brightness, scaling, zooming in, rotating, reflecting vertically and horizontally, etc. The images obtained must be as acceptably real as possible so that an external viewer cannot distinguish between an image created by increased data and an original one (Gómez-Ros et al., 2019).
Although there are various strategies for data augmentation, the previous ones have been explored only with natural images, and not with medical images. Among the strategies for increasing data in medical images, the following are recommended, since they were the ones that gave the best results, since they increased the percentage of effectiveness in the training of convolutional neural networks: flips (turn around), rotate (rotation), gaussian blur (gaussian blur) (Hussain, Gimenez, Yi y Rubin, 2018).

ARTERIO-VENOSA CAPILAR VENO-ARTERIAL
Vol. 12, Núm. 22 Enero -Junio 2021, e178 In this investigation, 2320 artificial images were created to increase the number of total images per category; Table 1 shows the set of real and artificial images that make up the DataSet.

Learning transfer
Learning transfer is a common and effective strategy for training a network with a small data set, where there is a pre-trained network on an extremely large data set, such as AlexNet, which contains 1.2 million images with 1000 classes, which are then reuse and apply to the given task of interest. The assumption underlying the transfer of learning is that generic characteristics (that is, what is learned in a large enough data set) can be shared between seemingly disparate data sets. This portability of learned generic features is a unique advantage of deep learning that is useful in various domain tasks with small data sets (Yamashita et al., 2018). Similarly, learning transfer from the pre-trained AlexNet network was performed, where the last layers were adjusted (this is explained in the following sections of this article).

Convolutional neural network architecture
For some years now, various models of deep convolutional neural networks have been The AlexNet pre-trained network has been used for this research work -as suggested in Tajbakhsh  It should be noted that the transfer of learning helps to train the two layers that have been modified according to the needs, and not the entire network; Figure 3 shows the final layer architecture of the convolutional neural network. The total number of images (between real and artificial) to train the convolutional neural network was 2465. In this regard, it is important to note that to test the network once trained, 23 unpublished images were used (that is, the network did not know them) .
The activities followed to proceed with the training of a convolutional neural network are listed below: 1. Initialize all the parameters or weights with random values of the first layers of the network.
2. Take a set of training images and use them on the model.
3. Compute the total error of the probabilities as a consequence of the model. 4. Extend back to compute the gradient error of all the weights in the network and use the descending gradient to modify these values and decrease the output error (Quintero, Merchán, Cornejo y Sánchez Galán, 2018).
An important parameter refers to the number of epochs that the convolutional neural network will have; an epoch refers to a single pass of the complete training set (Schilling, 2016). Knowing the number of epochs to train the convolutional neural network in medical images is an important factor to have a decision parameter; some authors recommend 50 epochs (Kumar, Kim, Lyndon, Fulham and Feng, 2017;Petscharnig and Schöffmann, 2018).

Training a classifier
Cataloging an object consists of naming it to one of the available classes. Objects can be determined by a list of particulars, such as the texture, size or color of their pixels; to be able to catalog objects it is necessary to specify the limits between the different classes.
Typically these limits are computed through a training treatment using the particularities of a series of example models of the classes. Limits are mentioned for clarity; in general, a classifier deduces decision patterns during training.
From this it follows that cataloging an unknown object consists of designating it to the class in which the particularities used during the training are more equivalent to the particularities of the object (Garrido Satué, 2013).

Vol. 12, Núm. 22 Enero -Junio 2021, e178
Once the training of the convolutional neural network has been completed, the training of a classifier is applied to the result of said training, since the data can be explored, selected particularities, specify cross-validation schemes, train models and evaluate results. .
Automatic training is likely to be performed to find out the preferable type of classifier model (such as decision trees, discriminant analysis, support vector machines, logistic regression, close neighbors, assembled classification). Figure 5 shows the selected classifier with data from various trained classifiers and which one obtains the best classifier model for the data presented.

Prototype operation
A prototype was developed to test the operation of the convolutional neural network, which was done with an end user who did not have much computer knowledge. The screen has three action buttons: browse, pause and exit. In Figure 6 you can see a screen of the prototype working in the anastomosis classification.
The data source must be a folder that contains the video of some surgical intervention in MP4 format; Subsequently, the prototype will reproduce the video automatically while the convolutional neural network selects each frame or frame, analyzes it, and when finding a match with the trained categories, it will mark them with a color (green for arterio-venous, yellow for capillary and blue for veno-arterial); at the end it will save said frame in the device.
The nomenclature with which the image is saved is as follows, according to the selected categories: The prototype output is a set of images classified and labeled by the convolutional neural network, according to the three previously mentioned categories. With the use of the prototype, the specialist doctor can have results to validate them.

Results
The main scoop of this research work was to determine if a computational tool could help identify anastomoses within a fetoscopic video and how accurate that identification would be. In this sense, during the development some results of the work carried out were obtained. For example: • 12 complete stitched videos of the surgeries were obtained with more than eight hours of fetoscopic video.
• From the 12 videos, a DataSet of 111 273 unclassified images was obtained.
• Product of manual labeling by the medical expert, the images classified by categories shown in table 1 were obtained. In this sense, and in a general way, it can be mentioned that a confusion matrix is a tool that is commonly used to delineate the performance of a classification system in a cluster of data in which they are known as true values. The matrix includes information from the actual and predicted classifications made by the system, and is used to compute the performance of the algorithm. Another way to examine the performance of the classifiers is the ROC (receiver operating characteristics) graphs, a method to visualize, organize and select the classifiers based on their performance, where the ROC curve contains all the information from the confusion matrix (Fos Guarinos, 2016 A result of the ROC and its AUC whose value is between 0.5 and 1 indicates that the prediction of the classifier is correct (if it tends to 1 it is almost perfect); on the contrary, a result below 0.5 indicates that there are inconsistencies in the classification of the data. Figure   8 shows each of the categories with its ROC curve and AUC.

Conclusions
In the present work, the creation of a computational tool to help the novice doctor in the identification of anastomosis within a fetoscopic video was explained, which is part of his preparation to perform an adequate intervention with laser surgery for twin transfusion syndrome to twin.
In this sense, it should be noted that the creation of a prototype that reads and reproduces fetoscopic videos contains within itself a convolutional neural network duly trained with three different categories of anastomosis images, which allows them to be classified with an effectiveness rate greater than 90%. while the video is playing. However, it is worth noting that the fluidity capacity of the prototype is directly linked to the computer equipment used, so one with high performance is suggested (for example, with a solid state hard disk of at least 500 Gb, at least 8 GB of RAM and a 64-bit operating system).
As an additional product, two image DataSets were obtained: one unclassified and the other duly classified by the doctor or expert, which was the basis with which the convolutional neural network used in the generated tool was trained. Therefore, it can be assured that the prototype created will be of great help for new medical personnel in training to perform fetoscopic surgeries, since it will allow them to have a support tool.

Future lines of research
Finally, as future lines of research, a comparative study and tests with other convolutional neural network architectures (apart from AlexNet), such as VGGNet, GoogleNet, ResNet and ZFNet, can be carried out in order to compare their results, increase the number of categories for make the prototype more complete, increase the number of images in the categories to optimize the accuracy of the results, and apply unsupervised learning to classify videos in order to make adjustments to the network and strengthen it.