Validación por juicio de expertos de un instrumento de evaluación para evidencias de aprendizaje conceptual Validation by Expert Judment of an Evaluation Instrument for Evidence of Conceptual Learning Validação por julgamento de especialistas de um instrumento de avaliação para evidências de aprendizagem conceitual

This educational research work focuses on the validation of contents by expert judgment of an evaluation instrument for evidence of conceptual learning, specifically the Synthesis. For the design of the evaluation instrument, called Checklist-Synthesis, the criteria and quality conditions that a Synthesis must contain were identified, preparing a first draft aligned to the EC0072 labor competence standard. Subsequently, a group of six knowledgeable teachers evaluate the Synthesis Checklist on four criteria: Quality, Sufficiency, Coherence and Relevance. The empirical results reflect that ninety percent of the items evaluated were assigned a rating of four, the maximum value on a scale of one to four. Statistical analysis (Friedman test) confirms the agreement between the judges in three of the four criteria, so the evaluation instrument was revised and improved in the non-concordant criterion – Clarity.


Introduction
In this work, the validation by expert judgment of an evaluation instrument (IE) of the checklist type is presented, which was used to determine evidence of conceptual learning around a synthesis, understood as the abbreviated exposition about a subject that must be explained personally by the student to facilitate understanding.
Since measurement is the process that links abstract concepts (latent variable or assumed construct), which can only be measured through observable variables (Cupani, 2012), any instrument for measuring or collecting data must meet three essential requirements: objectivity, validity and reliability (Hernández, Fernández and Batista, 2014).
Objectivity refers to the degree to which the instrument is permeable to the influence of the prejudices and tendencies of the researcher who administers, qualifies and interprets it, and is reinforced with standardization in the application of the instrument (same instructions and conditions for all participants ) and in the evaluation of the results; as well as employing personnel trained and experienced in the instrument.
Validity is the degree to which an instrument actually measures the variable to be measured, and three main types are identified in the literature: content, criteria, and construction.
The validity of the content by expert judgment is that used in this work, understood as the degree to which a measuring instrument apparently measures the variable in question, according to "qualified voices".
Reliability refers to the degree to which its repeated application to the same individual or object produces the same results. It is important to mention that a measuring instrument can be reliable, but not necessarily valid; therefore, it is a requirement that it prove to be reliable and valid. If not, the results should not be taken seriously (Hernández et al., 2014). Now, regarding the evaluation of learning with a focus on professional competencies, it should be noted that it must be objective because it integrates a set of evidence that can confirm the scope of the competence by the student, that is, it must be based on evidence produced in the learning activities determined by the teacher. Therefore, the evaluation of competencies is an integral, permanent, systematic and objective process that should serve to determine if the goals established in the subject have been achieved (Tecnológico Nacional de México [TecNM], 2015[TecNM], , 2018.
The teacher, therefore, must consider the integration of quantitative and qualitative information, as well as the different types and forms of evaluation and a variety of instruments, in such a way that the co-managers of the process can make timely decisions in search of permanent improvement. .
In this sense, in the National Technological Institute of Mexico (TecNM), for the evaluation of the evidences (product of the conceptual, attitudinal and procedural learning), the use of evaluation instruments is recommended. The competency standard EC0772 Assessment of learning with a focus on professional skills establishes as a certification requirement that the teacher of the TecNM be able to design and apply at least four assessment instruments: the checklist, the observation guide, the questionnaire and the rubric. In fact, it also indicates the general characteristics of these IE.
It is assumed that if the teacher uses an instrument to evaluate these evidences (where the requirements / quality conditions expected of the student are indicated), the student will know with certainty what are the requirements that must be met, so that they can design lines of action to achieve that end.
That said, for validation by expert judgment, non-parametric tests such as Friedman's (non-parametric analysis of a randomized block experiment) are used, which offers an alternative to the two-factor analysis of variance (Granato, De Araújo Calado and Jarvis, 2014).
Randomized block experiments are a generalization of paired experiments, and Friedman's test is a generalization of paired signs test.

Methodology
The methodology used was quantitative because data collection was used to test hypotheses based on numerical measurement and statistical analysis, carried out through a set of research processes that involved the collection and analysis of quantitative data, as well as its integration and discussion, in order to achieve a greater understanding of the phenomenon studied through inferences from the results of the information collected (Bernal, 2010;Hernández et al., 2014;Malhotra, 2008).

Materials
In the literature review stage, the main input was the articles and publications identified in the Virtual Library (BIVIR) of the Autonomous University of Ciudad Juárez, which has 30 databases, including Annual Review, Ebsco, Elsevier , Emerald, Sciencedirect, Wiley, etc.
To evaluate the content and structure of the synthesis, it was necessary to design a checklist-type EI, designed from the review of the literature to meet the quality conditions of the EC0772 labor competency standard. Statistical data analysis will be carried out in the Minitab software package, version 17.

Method
Following a similar dynamic to that proposed by Romero, Gómez and Parroquín (2016),

Romero, Matheus-Mari and Poblano-Ojinaga (2017) and Poblano-Ojinaga, López, Gómez and
Torres-Arguelles (2019), this research has been planned in three steps: in the first one -since measurement is a process of linking abstract concepts with empirical indicators to record information or data about the variables, and because an EI records observable data that really represent the concepts or variables under study-We proceeded to identify the criteria and quality conditions that a synthesis must meet through a review of the literature and through the opinion of two experts on the subject. In this way, a first draft of the list of criteria and quality conditions that the IE should contain was achieved.
In the second step (design of the evaluation instrument), based on the information obtained in point 1 and considering the structure required in element 4 of EC0772, the Vol. 12, Núm. 22 Enero -Junio 2021, e240 requirements or quality conditions that a synthesis must contain were defined, which served to prepare a draft of the checklist.
The third step was the validation of the content through the judgment of IE experts, for which a group of experts was asked to evaluate the checklist in four areas (sufficiency, clarity, coherence and relevance), following the recommended procedure. by Escobar-Pérez and Cuervo-Martínez (2008).
Nonparametric tests were used for validation by expert judgment because nominal, categorical and ordinal data were used. Likewise, we selected the Friedman test to determine differences in central location (median) for the analysis of trials with one-way repeated measures that have three or more dependent samples (Granato et al., 2014); in this case, to determine the degree of agreement between the experts and their p-value to choose between two opposing hypotheses based on their collected data: H0: There is no significant agreement among the experts / judges. H1: There is a significant agreement among the experts / judges.

Results
In the first step, the criteria and quality conditions or aspects to be considered were identified (Beltrán, 2005;Fernández and Bressia, 2009;Quispe and Melanez, 2018) to evaluate a synthesis presented by the students as evidence of a conceptual learning activity . Likewise, two teachers (deans of the Economic-Administrative Department of the La Laguna campus) were asked to review the list of quality conditions or evaluation criteria of the synthesis, as well as their recommendations and suggestions. The result of this stage was a first draft of the checklist that included five criteria and ten quality conditions (Table 1).
Con sus propias palabras y sin errores de ortografía.

Fondo
La idea principal Desarrollo Los puntos que giran alrededor de la idea principal.

Fuente: Elaboración propia
In step two (IE design), in order to prepare the first draft of the checklist-synthesis, the comments and changes proposed by the deans were taken into account, as well as the structure required in element 4 of EC0772 . The performance criteria are listed below: • The evaluation instruments based on the teaching-learning activity and the developed competence: correspond to the level of performance of the competencies.
• The prepared checklist includes reagents that comply with the following structure: article, object, verb and quality condition; corresponds to the type of evidence to be evaluated (Consejo Nacional de Normalización y Certificación de Competencias The result of this stage was the instrument presented to the experts or experts for their evaluation (figure 1).

Figura 1.
Instrumento para evaluación de la lista de cotejo-síntesis por juicio de expertos Fuente: Elaboración propia In step three (validation of the EI by experts), a group of 10 experts (teachers with extensive experience) were asked to evaluate the checklist-synthesis in 4 different areas: sufficiency, relevance, clarity and coherence. Here, 4 evaluations were eliminated for presenting inconsistencies.
The selection of experts or connoisseurs was based on three criteria: teaching experience (at least 15 years), academic training (with a doctorate in education or a related area, preferably) and being teaching at the time of the study (compared to group ). From this group of 10 selected teachers, 4 evaluations were eliminated due to inconsistencies. An initial exploration of the results of the evaluations in the four criteria showed that of the 210 assignments of a value to the items / criteria, in 90% of the times the evaluators coincide in assigning a value of 4 to the item, and 8.6% in assign it a value of 3, which empirically would demonstrate the agreement between the 6 judges (figure 2).

Bajo Nivel
Los ítems miden alguna parte del aspecto a evaluar pero no corresponden con el total.

Bajo Nivel
El ítem requiere bastantes modificaciones o una modificación muy grande en el uso de las palabras de acuerdo a su significado o por la ordenación de las mismas.
Parte de la gramática que estudia el modo en que se combinan las palabras y los grupos que estas formas para expresar significados, así como las relaciones que se establecen entre todas las unidades. As an example, in relation to the clarity criterion, the experts assigned a value to each item (between 1 and 4) as they considered correct or convenient (Table 2). Fuente: Elaboración propia Subsequently, the data in Table 2 were statistically analyzed with the help of the Minitab software, version 17, using the Friedman test. The hypotheses raised were the following: Mediana principal = 4.0000 Fuente: Elaboración propia * Minitab prints the test statistic, which has an approximately chi-square distribution and the degrees of freedom (number of treatments minus one). If there are ties within one or more blocks, the average rank is used and a test statistic corrected for ties is also printed. If there are many ties, the uncorrected test statistic is conservative; the corrected version is usually closer, but can be conservative or liberal, and shows an estimated median for each level of treatment.
The estimated median is the main median plus the treatment effect.
In the same way, the Friedman test was carried out for the criteria of sufficiency, relevance and coherence.

Discussion of results
The results of the statistical analysis of data (as can be seen in table 4) show for the criteria sufficiency, relevance and coherence, the test statistic S has a p value greater than 0.050 (not adjusted for the ties), and supports the statement that there is insufficient evidence to reject H0, given that the p-value is greater than the alpha level; therefore, for these criteria it is concluded that the data support the hypothesis that the treatment effects are zero. In other words, there is agreement among experts on the items that a synthesis checklist should evaluate.
However, for the clarity criterion, the test statistic S has a p value less than 0.050 (not adjusted for the ties), so that there is sufficient evidence to reject Ho because the p-value is less than the alpha level, it is In other words, there is no agreement among the experts, so it is necessary to review and improve the checklist in this criterion.
The process to make possible the content validation of the research instruments through expert judgment is more efficient when the criteria to be evaluated by the expert or expert are clearly specified, as well as when statistical analyzes are used, such as the test. Friedman.

Conclusions
The empirical results of this research work reflect that 90% of the items evaluated were assigned a rating of 4, the maximum value on a scale of 1 to 4. The statistical analysis by means of the Friedman test confirms the agreement or concordance among the judges in 3 of the 4 criteria, so the checklist-synthesis was revised and improved in the clarity criterion (see annex).
Given that this research is based on the assumption that if there is no standardized EI, it could negatively affect the scope of the student's competence (due to the fact that the criteria of the evidence presented by the student are not formally and / or clearly defined by the teacher or teacher), this objective is met by having an IE l checklist-synthesis, formal and easy to use for students and teachers, and duly validated by expert judgment.
On the other hand, it is important to mention that when selecting the people who will participate in the evaluation of the measurement / evaluation instrument, it should be foreseen that they know the subject to be treated, either from experience in the labor field, from their professional training or for his academic career. It is also recommended to anticipate the number of experts who will act as judges, according to the characteristics of the test and the statistical analysis.

Future Research Lines
This work is the first stage of the educational research project Design and validation of an IE assessment instrument for conceptual learning based on TecNm guidelines and the EC0072 standard, and their effect on failure rates (ITF-LLAG-PIE-2019-0228). Therefore, the next stage will focus on the evaluation of the internal consistency (reliability) of the instrument by means of Conbrach's alpha through a pilot run in different groups.
A limitation of this study was that the validation by IE's expert judgment was carried out in a higher education institution belonging to the National Technological Institute of Mexico, so the results cannot be generalized; However, since the validation method is presented clearly and concisely, the procedure could be useful for other instruments, such as questionnaires, rubrics, etc.