Loading...
Loading...
- **Title:** Expanding Queries for Code Search Using Semantically Related API Class-names
# Analysis of Paper 50 ### Task 1: Extract Key Metadata - **Title:** Expanding Queries for Code Search Using Semantically Related API Class-names - **Authors:** Feng Zhang, Haoran Niu, Iman Keivanloo, Ying Zou - **Publication Year:** 2017【4:15†source】. ### Task 2: Summarize the Paper The paper proposes an automated approach to enhance code search engines by expanding natural-language queries with semantically related API class-names. The methodology involves using a neural network language model to identify and recommend API class-names that are semantically related to the initial query. The study evaluates the effectiveness of this approach using a large corpus of code snippets and compares it against three off-the-shelf query expansion methods. Key findings indicate that the proposed approach significantly improves search performance, retrieving relevant code examples more effectively than existing methods. The paper concludes that query expansion with semantically related identifiers can substantially enhance code search engines【4:15†source】. ### Task 3: Extract Dataset Usage The paper uses a large-scale corpus of code snippets extracted from 24,666 open-source Java projects, totaling 23,677,216 code snippets. It also utilizes a benchmark dataset created for clone detection by Svajlenko et al., which includes code snippets from ten distinct functionalities, expanded to 40 in the new version【4:0†source】【4:9†source】. ### Task 4: Analyze BigCloneBench Usage - **Q A:** Is the paper a systematic literature review (SLR) or a survey? - **A:** No. - **Explanation:** The paper presents a novel approach rather than reviewing existing literature systematically. - **Quote:** "In this paper, we propose an automated approach..."【4:15†source】. - **Q B:** Does the paper present a novel clone detection or clone search approach or evaluate existing approaches? - **A:** Yes, it presents a novel approach. - **Explanation:** The paper introduces a new method for query expansion in code search. - **Quote:** "In this paper, we propose an automated approach..."【4:15†source】. - **Q C:** Does the paper use BigCloneBench for evaluation? - **A:** Yes. - **Explanation:** The paper uses a benchmark dataset created for clone detection by Svajlenko et al., which is related to BigCloneBench. - **Quote:** "The first golden set is obtained from a benchmark that was created for the clone detection purpose by Svajlenko et al."【4:9†source】. - **Q D:** Is BigCloneBench used as ground truth for training a machine learning approach? - **A:** No. - **Explanation:** The dataset is used for evaluation, not for training a machine learning model. - **Quote:** "A golden set is essential to evaluate how our approach performs..."【4:0†source】. - **Q E:** Which version of BigCloneBench (old/new) has been used? - **A:** New version. - **Explanation:** The paper mentions the expansion to 40 functionalities, which corresponds to the new version. - **Quote:** "The number of target functionalities has been expanded to 40 in the new version of the benchmark dataset."【4:9†source】. - **Q F:** Has the ground truth of BigCloneBench been filtered/modified? If so, what is the size of the subset used? - **A:** Yes, it has been filtered. - **Explanation:** The authors manually reviewed and selected 24 target functionalities from the benchmark. - **Quote:** "The authors manually reviewed the 40 target functionalities in the benchmark and finally obtained 24 target functionalities."【4:9†source】. - **Q G:** Has the WT3/T4 subset been excluded from evaluation? If filtered, was WT3/T4 part of the subset? - **A:** Not explicitly mentioned. - **Explanation:** The paper does not specify the exclusion of WT3/T4. - **Quote:** Not available. - **Q H:** Ignoring filtering, has the ground truth otherwise been changed, extended, or enriched? - **A:** No. - **Explanation:** The paper uses the dataset as provided for evaluation purposes. - **Quote:** "A golden set is essential to evaluate how our approach performs..."【4:0†source】. - **Q I:** Has the paper used a subset created by previous work? - **A:** Yes. - **Explanation:** The subset of functionalities was selected from a benchmark created by Svajlenko et al. - **Quote:** "The first golden set is obtained from a benchmark that was created for the clone detection purpose by Svajlenko et al."【4:9†source】. - **Q J:** Has the paper validated or manually investigated BigCloneBench’s ground truth? - **A:** Yes. - **Explanation:** The authors manually reviewed the target functionalities. - **Quote:** "The authors manually reviewed the 40 target functionalities in the benchmark..."【4:9†source】. - **Q K:** Does the paper cite "BigCloneBench Considered Harmful for Machine Learning"? - **A:** No. - **Explanation:** There is no mention of this citation in the provided excerpts. - **Quote:** Not available. ### Task 5: Critical Analysis – Impact of New Findings - **Q L:** Does this finding weaken or invalidate any claims in the paper? - **A:** Potentially, yes. - **Explanation:** If 93.35% of WT3/T4 pairs are not clones, it could affect the validity of evaluations using these pairs as ground truth. - **Impact:** This finding could impact the methodology and conclusions, particularly if WT3/T4 pairs were used in the evaluation. It raises concerns about the reliability of the dataset as a benchmark for clone detection, potentially affecting the generalizability of the study's results. However, without explicit mention of WT3/T4 usage, the direct impact is uncertain. # Summary ```plaintext Question, A, B, C, D, E, F, G, H, I, J, K, L Paper 50, No, Yes, Yes, No, New, Yes, Not specified, No, Yes, Yes, No, Potentially ``` **Note:** - **G:** The paper does not specify whether the WT3/T4 subset was excluded from evaluation. - **L:** The finding that 93.35% of WT3/T4 pairs are not clones could potentially weaken the study's claims if these pairs were used, but the direct impact is uncertain without explicit mention of WT3/T4 usage.
- Without a harness, you **can't compare** prompts, models, retrieval configs, or costs.
Evaluate, benchmark, and regression-test AI/LLM systems. Covers evaluation framework design, benchmark creation, human evaluation protocols, automated evaluation (LLM-as-judge), regression testing, statistical significance, and continuous evaluation pipelines.
<img width="1388" height="298" alt="full_diagram" src="https://github.com/user-attachments/assets/12a2371b-8be2-4219-9b48-90503eb43c69" />
A list of all public EEG-datasets. This list of EEG-resources is not exhaustive. If you find something new, or have explored any unfiltered link in depth, please update the repository.