Referee: 2 Comments to the Author This referee thinks that the paper should be revised on the following points before being accepted for publication on EPJA. 1. The introduction starts with "The investigation of nuclear matter under extreme conditions of high temperature and high density is one of the major research topics in modern physics [1]" This is obviously true, but it is also true that 1-2 A GeV beam energies are far to be "extreme". Similar measurements have already been done at much higher CM energies at CERN and BNL. Moreover, reference [1] should be complete and not refer to one publication only. 2. The UrQMD code is an important element of the paper. Notice that it is called "transport code" initially and "model" later on. Usually, "model" in physics is a coherent picture of a set of phenomena, which is almost a theory. Is the UrQMD at such a level? The conclusions of the paper cannot be evaluated without an understanding of this "model", of its assumptions, its limits, its relevance for physics, etc. These should be sumarised in the paper. 3. Similarly for the PLUTO "generator", called "model" too. At a point it is claimed to be "described above" but this is not true. The reader cannot understand what it is. 4. Table 2. a. The units of the "inverse slopes" T are not specified b. The quoted uncertainties on similarly measured quantities range from 2 per mille to 20 per cent. This fact raises serious doubts. Even more so because no discussion of the errors is included. c. No discussion of the uncertainties on the UrQMD too. But no conclusion can be reached without knowing statistical and, even ore, systematic uncertainties d. Consider the 1 GeV data. The authors fit separately the pai+ and paiâ^À^Ó data with a single exponential function. The fit is statistically extremely good and, consequently, there is no statistical justification to complicate matters. The authors do not declare any, non statistical reason to do so, but go on and fit the same data with a superposition of two exponentials. The slopes of pai+ and paiâ^À^Ó samples that were equal with one exponential fit, are now different. This is claimed to be statistically significant (5 standard deviations!!). This conclusion looks as a statistical artifact! It might be traced to the same cause that produces uncertainties differing by two orders of magnitudes on similar quantities. e. Similar considerations apply to the 2 GeV data and simulations in the table f. Notice that in the text discussing the table the pion charges are called "species" 5 It is extremely difficult to appreciate the physical conclusions of the paper. For example, 4.2 discusses the rapidity distributions. The measured distributions are compared (even in a rather small fraction of the phase space) with two simulations, UrQMD and PLUTO (which, as mentioned above have not been described). PLUTO looks to agree with the data, UrQMD to strongly disagree. But, what do we learn from that? This is an example of a more general issue. Data are compared with one ore two MonteCarlo simulations, which are not described. What are the implications of a disagreement, or of an agreement, between data and simulations? If a disagreement is found, it is just a matter to adjust a few parameters in the Monte Carlo, or some of its basic physical assumptions is wrong? Learning on a Monte Carlo might be interesting, but, more basically, what do we learn on nuclear matter under the (not-so-)extreme conditions of the experiment? Referee: 1 Comments to the Author A detailed measurement of pion production in 12^C+12^C collisions, as a basis for the analysis and interpretation of electron measurements in the HADES experiment is certainly worthy of publication in EPJA. The charged pion results presented in this paper appear to be of high quality and worthy of publication. The results on the anisotropy of the pion emission are particularly interesting. In summary, the data are certainly worthy of publication. However, the presentation and discussion of the results must be improved before publication, as discussed in detail below. This paper makes a comparison of the HADES results with previous measurements of pion production in C+C collisions at 1 and 2 A GeV by the KaoS and TAPS collaborations. It is very important that this comparison be made. A careful comparison with the neutral pion production results of TAPS is especially important for the HADES program of electron measurements. However, the paper makes the comparisons only with integrated multiplicities removing nearly all information for a meaningful comparison. The KaoS and TAPS pion spectra were measured in a limited acceptance region. HADES has made the charged pion measurements over a large rapidity interval. Therefore the HADES spectra should be compared directly to the pion spectra from KaoS and TAPS for the same (or similar) rapidity acceptance, and ideally with the same trigger condition, i.e. a data-to-data comparison should be made, to as great an extent as possible. It would be much more informative to see, for example, a ratio of pion spectra. e.g., TAPS/HADES at the same rapidity. Is there some reason that this has not been done? Unfortunately, this paper presents no minimum bias, or unbiased pion spectra results. It is an important question whether the LVL1 trigger selection on multiplicity, i.e. a centrality trigger, is associated with a modification of the spectra. The paper does not address this question, or acknowledge to what extent it might change the conclusions with respect to the shape of the pion spectra, or the measured anisotropy. In fact, since the charged-particle multiplicity is mostly an indication of centrality, or the number of participants, it is largely meaningless to compare multiplicities for different centrality selections, or compare multiplicities after "correcting" the multiplicity for centrality selection based on the calculated number of participants, since it simply reflects the comparisons of centrality selections. As stated above, the only interesting comparison then is a data-to-data comparison of the spectral shapes in the same rapidity interval. Similar shapes would indicate how well the measurements agree (or how significantly the shape might depend on centrality selection). Normalization differences will reflect normalization errors, which may be mostly centrality corrections for centrality selected data, as is the case for the HADES data. Stated again, I find the comparisons of pion multiplicities, as presented in this paper (Section 4.3) nearly meaningless. I have additional questions about how this was done. For example, the extrapolation of the HADES data to extract the full acceptance yield is stated to be done by two extrapolation methods, A and B, based on UrQMD or a data parameterization and extrapolation simulation called PLUTO. Method B is the natural method one would use, with estimated errors on the parameters of the parameterization used to extract the estimated systematic error on the yield extrapolated to full acceptance. This should be done, and those errors should be given in Table 3. On the other hand, it is not explained how UrQMD was used - were the UrQMD results arbitrarily normalized to the data somewhere and then used to extract the integral result? If anything similar to this was done, then the UrQMD results should not be included in Table 3 and called a "method" to extrapolate. Instead the UrQMD integrated result should be quoted as an UrQMD result, and that should be compared to the PLUTO result with errors (which would include the rapidity distribution assumption errors that presumably would also encompass an UrQMD-like assumption on the rapidity distribution). The UrQMD results in Table 3 obviously (apparently) are not pure UrQMD results since the yields in Table 3 would be the same as the multiplicities in Table 1, which they are not, except for the 1A GeV pi+ case. I would not at all agree that the comparison of the UrQMD (method A) and PLUTO result (method B) gives an estimate of the systematic error, contrary to what is stated. Method B should be used for extrapolation of the data, with the errors on the parameters of the parameterization used to estimate the systematic errors, which should then be given in Table 3. Along these lines, more questions are raised by the centrality selection. As seen from Figure 7, the UrQMD track multiplicity distribution peaks at a slightly higher multiplicity than the measured track multiplicity. Does UrQMD describe the TOF multiplicity used in the trigger better than shown here? Or do the differences in data and UrQMD multiplicities just reflect systematic errors in how well the trigger selection has been implemented in UrQMD. It would be useful to see how well the UrQMD simulation matches the actual trigger multiplicity distribution with a figure showing both the data and UrQMD distributions upon which the trigger cuts have been applied. One of the two main results of the paper is the tabulation of multiplicities in Table 4. As already discussed, this is hardly an interesting final result. Nevertheless, there are problems here. The actual measured and extrapolated multiplicities in Table 3 should be given with errors (as mentioned above), and presumably there are additional errors in the assumption of participant scaling and the calculation of the number of participants based on UrQMD only. Those errors are certainly significant and are not discussed and quoted. Furthermore, the KaoS measurement quoted at 2 A GeV was obtained at 1.8 A GeV, according to the reference [20]. The multiplicity quoted is not the same as given in reference [20]. So obviously it has been modified from the KaoS result, but this is not described. Also, the TAPS reference given [21] is not a TAPS reference, but a discussion of TAPS results. Either the primary TAPS reference should be used (preferred), or it should not be quoted as a TAPS reference. As already mentioned, a direct comparison of spectral shapes between the HADES results and the KaoS and TAPS result would be very interesting. It is noted that a single thermal slope can describe the HADES mT spectra at 1 A GeV, but not at 2 A GeV. It's further noted that this conclusion is similar to that of Kaos for charged pions, but not similar to that for neutral pions, as a single slope parameter was used to describe the TAPS results. However, in ref [21] it is apparent that the neutral pion yield systematically rises above a fit to the spectra restricted to high pT. Therefore, a two-slope fit would clearly give an improved result. Again, it would be most interesting to have a direct data-to-data comparison of the various pion SPECTRA, either taking ratios of data, or using the same fit procedures. The measurements of the anisotropy of the pion yields are very interesting and deserve greater discussion. The errors shown in figure 11 need further explanation. Presumably they are dominated by systematic errors. The comparison of the results reflected about y0 is an important check, but this check suggests that the systematic errors are larger than estimated. Since the errors presumably are dominated by systematic errors, it would be useful to further divide the data sample based on centrality, i.e. track multiplicity, to investigate the dependence of the anisotropy on centrality. As discussed in the paper, there are indications from previous results that the anisotropy almost certainly depends on the centrality of the collision. This data sample should allow to make a more detailed statement on this point. Furthermore, if it is seen that the anisotropy is dependent on the centrality, it again raises the issue of how accurately the centrality selection has been implemented in the UrQMD comparison. Has the UrQMD result been calculated for variations of the centrality selection to estimate the systematic error on the UrQMD comparison? The results shown in figure 12 are very interesting, but I have several questions and comments about this figure. First it appears that data points are missing for the case of 1AGeV pi+ at 550 MeV/c and for 2AGeV pi+ at 650 MeV/c - comparing to pi-. Is there a reason? Also, there appear to be several UrQMD points missing, but they probably overlap with the data, but this cannot be seen from the small size of the figure and plot points. A statement must be made on the UrQMD errors. No UrQMD errors are shown. Does it mean that the errors, statistical and systematic, are negligible, or that they just have not been shown? Before drawing any conclusion about the comparison with UrQMD a statement about the UrQMD errors must be made. Taking the UrQMD errors as negligible, It's stated that the UrQMD model results tend to level off at larger values than for the data. It's not clear what is meant by this statement. For the 2 AGeV results UrQMD and data appear to be in quite good agreement, with the pi- data having perhaps larger values of anisotropy, in contradiction to the statement made. On the other hand, for the 1A GeV case UrQMD does have larger values of the anisotropy, but the data appears to "level off" at a larger value of the momentum. But again, without a clear statement about the UrQMD errors, conclusions from the comparison with the data are meaningless. Figure 11 should include the UrQMD results. There appears to be a dashed line, at least for the 1A GeV case. What does it represent? In the summary it's stated that the reasonable agreement of the spectra with UrQMD indicates that the degree of thermalization is adequately reproduced in the model. How does this follow from the discussion in the text where it's noted that at 2A GeV the data are better fit with two slopes, unlike UrQMD? In the discussion of the anisotropy in the summary it's noted that the data do not support a rise and fall of the anisotropy with momentum (true) that is however seen in the UrQMD results of figure 12, but not discussed, and so the data are not in agreement with UrQMD on this point, contrary to the implication in the summary. Also, the discussion of the effects of Delta excitation on the asymmetry were mentioned so superficially in the text that the conclusions in the summary appear unsupported. General comments: Errors: * It was stated that systematic errors were estimated based on comparisons of the results from different sectors of the HADES spectrometer. Are the measurements from all sectors entirely independent? Are there common potential systematic errors - like the absolute field measurements, tracking chamber geometry, or PID yield extraction procedures, that are not reflected in sector comparisons. * If yields are to be extrapolated to min bias based on participant scaling, the associated errors of this assumption and calculation need to be discussed and presented. * The number of UrQMD events simulated should be stated. Are the statistical errors on the UrQMD results negligible? What are the systematic errors resulting from the centrality selection? * On page 15, line 38 it's stated that the asymmetries in dN/dy about y0 is used to control the systematic errors. What exactly does "control" mean? Was it used as a basis of the systematic error estimate, or simply as a check? * The systematic errors should be included in Tables. Figures: * All of the captions of figures 8-12 should explicitly state that the results are for (semi-central) triggered data, to insure that the results are not misinterpreted by a casual reader as minimum bias results. * Figures 10-12 should be made larger. Other minor comments: * Hyphenation is not used consistently: charged-pion, time-of-flight * Tables 1 and 2 are out of sequence. * Table should not be abbreviated as Tab. Some suggestions: * p.2 line 38: "The main emphasis of the HADES program of measurements is on the di-electron signal..." * p.2 line 47: "The yields, transverse mass, and angular distributions..." * p.4 line 29: "from the charged pion yields measured by HADES in the same C+C data samples." * p.6 line 11: "trajectories are constructed and their momenta are deduced." * p.7 line 31; "more sophisticated analyses, like.." * p.7 line 48: "particles with different mass occupy different..." * p.7 line 58: "and TOF (left) regions." * p.8 line 35: "10% of tracks identified as pions are muons from ..." * p.10 line 24: {\rm MeV} * p.14 line 52: "As seen in the previous section..." * p.15 line 44: "rapidity distribution is about 20%..." * p.15 line 54: "underestimation of our data by UrQMD is observed." * p.16 line 20: {\rm fm} -- Why is this estimate of the cross section used? UrQMD should be used to consistently extract the total cross section, triggered cross section, and corresponding number of participants. * p.18 line 54: "considered to suggest collectivity due to..." * p.19 line 46: "1 and 2 A GeV have been measured..."