Καταργημένα θέματα ερευνητικών εργασιών 2017-

Programming practices spread model 🚧

The research will analyze the spread over time of programming practices, such as continuous integration, unit testing, static analysis, and code reviews. The model can be based on corresponding epidemiological models from the field of public health. The analysis will be carried out on the infrastructure of the University of Tennessee, Knoxville in collaboration with Prof. Audris Mockus. Instructions for using the infrastructure are available here. Evidence for the proposed method can be gleaned from the study of Yuxing Ma and his colleagues entitled A Methodology for Analyzing Uptake of Software Technologies Among Developers.

Assigned, 2020, retired 2025.

The accuracy of code ownership algorithms

A number of studies, such as those by C. Bird, M. Greiler, and M. Foucault, have used a data analysis algorithm to determine ex post who is the owner of each source code file. Yet, various open source software projects, such as Linux, maintain a public list of module owners. This study will compare the accuracy of the algorithm’s results against the contents of ownership lists, determine the reasons for discrepancies, and the effect these might have on studies employing the algorithm.

Retired, 2025.

Do tired developers write bad code? 🚧

Considerable research has been performed to establish links between fatigue and safety in the workplace and the job’s performance. The objective of this research study is to look at links in the software development setting. Two hypotheses that can be tested are 1) that tired developers cut corners by reducing the unit tests they write, and 2) that tired developers produce more buggy code. The hypotheses can be tested by examining code commits to see 1) when buggy code is being written, and 2) when new unit test code is being written. A model of working days and hours for a given developer can then be built, also based on the history of the developer’s commits. If the hypotheses are true, then relatively fewer unit test lines and more buggy code commits will occur at the end of the workday and at the end of the workweek.

Readings and references

Gaba, David M., and Steven K. Howard. Fatigue among clinicians and the safety of patients. New England Journal of Medicine 347.16 (2002): 1249-1255. DOI: 10.1056/NEJMsa020846
Williamson, Ann, et al. The link between fatigue and safety. Accident Analysis & Prevention 43.2 (2011): 498-515. DOI: 10.1016/j.aap.2009.11.011
Noy, Y. I., Horrey, W. J., Popkin, S. M., Folkard, S., Howarth, H. D., & Courtney, T. K. (2011). Future directions in fatigue and safety research. Accident Analysis & Prevention, 43(2), 495-497. 10.1016/j.aap.2009.12.017
Jon Eyolfson, Lin Tan, and Patrick Lam. Do time of day and developer experience affect commit bugginess? MSR ’11: Proceedings of the 8th Working Conference on Mining Software Repositories. May 2011. pp. 153–162. DOI: 10.1145/1985441.1985464

Assigned, September 2022, retired 2025 due to marginal effect results.

Developer attitudes regarding code reviews

The objective of this study is to analyze an interesting 2019 Twitter stream started by Dan Luu on the topic in order to provide a taxonomy of developer attitudes regarding code reviews. This can be performed using content analysis. The findings can then be further examined using other methods such as a bibliographic survey, software repository mining, or a questionnaire survey of developers.

Retired 2025 due to dated data set.

npm: DLL hell reinvented

The term DLL hell used to refer to the problem encountered when installing on a Windows system applications with dynamically-linked libraries that were mutually incompatible. As secondary storage capacities increased it was eventually solved by packaging and installing DLLs together with each application, rather than globally for the whole system. Current npm-based package distribution methods seem to face a similar problem. The objective of this study is to examine the problem in the context of Expo, a popular open-source platform for making universal native apps for Android, iOS, and the web with JavaScript and React. Following a case study of Expo, the work will identify the culprits, the reasons these exist, and, finally, propose, prototype, and evaluate a method for addressing the problem.

Retired 2025 due to shift in interests. ## Citizen consultation in law making Under law, most new legislative proposals in Greece are to be placed online for public consultation before being sent to Parliament. The objective of this study is to determine the extent to which citizen comments are taken up by the government. This can be done by mining (e.g. through scrapping) the public consultations and the corresponding texts sent to Parliament, finding the differences, and assessing these against the comments received in the consultation process. The assessment can be performed quantitatively, based on the number of changes and number of comments, and also qualitatively by manually examining some changes and comments. The study can be extended to also look at changes made at the Parliament stage by comparing the submitted law’s text against that published in the Official Journal.

Material

Implemented, 2023–2024 by Antonis Athanasiou; see also corresponding newspaper article.

Improving the performance of Docker builds

Dockerfiles specify a set of instructions from which we can derive Docker images. Docker images are then used to spawn lightweight containers used to run batch jobs and services in an isolated environment. Docker is now widely used in both industry and academia.

Many Dockerfiles involve performance problems that degrade developers’ productivity. For example, assume that you need to rebuild your Docker image (because you updated one of your dependent source files). If the Dockerfile is not structured properly, Docker may take much time to re-build the image because it runs unnecessary commands, which do not rely on the initial mofication.

The goal of this project is to develop a refactoring technique for Dockerfiles. This technique will translate a poorly-structured Dockerfile in a way that when we update a specific point, Docker will re-run only the instructions that depend on the update.

For example, consider the following Dockerfile:

    ADD scripts/ ${HOME}/scripts/
    RUN ./scripts/setup-sqlserver.sh (time consuming task)
    ....
    RUN ./scripts/setup-orms.sh

The issue in the above Dockerfile is that when we modify the scripts/setup-orms.sh script, Docker will re-run the scripts/setup-sqlserver.sh script, even though the latter is irrelevant to the update.

Dockerfile after refactoring:

    ADD scripts/setup-sqlserver.sh ${HOME}/scripts/setup-sqlserver.sh
    RUN ./scripts/setup-sqlserver.sh (time consuming task
    ...
    ADD scripts/setup-orms.sh ${HOME}/scripts/setup-orms.sh
    RUN ./scripts/setup-orms.sh

Tools software engineers use

This empirical study will examine the tools used by software engineers by mining open source GitHub project workflows. By collecting what GitHub Apps and Actions are referenced through the workflows’ uses keyword, the study will first determine the popularity of specific extension types. It will also apportion these into categories to see which categories can benefit from from tools. Finally, a qualitative analysis of the most popular tools in each category will distill work tasks that can be profitable automated.

Implemented by others; see e.g. T. Kinsman, M. Wessel, M. A. Gerosa and C. Treude, “How Do Software Developers Use GitHub Actions to Automate Their Workflows?,” 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), Madrid, Spain, 2021, pp. 420-431, doi: 10.1109/MSR52588.2021.00054.

Visual development of data processing pipeleines redux

Develop a development environment based on Blockly or Microsoft MakeCode where users can easily construct data processing pipelines by visually plugging together processing blocks. The pipeline will be implemented underneath using Unix tools, such as sort, wc, uniq, diff, join, tr, cut, head, tail. The tool command-line options will be made available through block parameters, ideally through a suitable meta-language that describes the parameters and their syntax. The work can be extended by adding nested mini-languages for awk, sed, and sh.

References

Diomidis Spinellis. Unix tools as visual programming components in a GUI-builder environment. Software: Practice and Experience, 32(1):57–71, January 2002. doi:10.1002/spe.428.

Implemented, 2023-2024 by Klenti Cipi and Pantelis Kakavas ; see the blockly_unix GitHub repository.

Detecting package dependencies through network tracing

Software supply chain management relies on the accurate detection of dependencies associated with a given project. The objective of this research is to evaluate existing systems for constructing a software bill of materials, and to improve upon the state of the art by implementing a tool based on network tracing. The proposed tool will build the software on a clean container and in parallel it will monitor network traffic to identify fetches of software components, which will be recorded as dependencies. The study will evaluate the constructed tool against other approaches.

Locating malicious software supply chain rings

The pervasiveness of free software reuse and the open nature of most free software projects can be exploited through the purposeful injection of vulnerabilities by malicious actors to implement software supply-chain attacks. It is conceivable that these actors work under fake identities and collaborate in groups in order to establish a credible online presence. The objective of this study is to develop and evaluate techniques for identifying such ring groups. The techniques may include social network analysis (the groups are likely to be closely-knit) and machine learning (the characteristics and behavior of the corresponding profiles may be outliers compared to those of real persons).

Implemented by others; see e.g. Anomalicious: automated detection of anomalous and potentially malicious commits on GitHub and Practical automated detection of malicious npm packages.

The effects of working from home on programming quality and productivity

Many modern developer workplaces have an open-office layout. While this obviously reduces office space costs, its proponents argue that it enhances cooperation. On the other hand detractors argue that sch layouts make it difficult to concentrate, something that is anecdotally visible through the widespread adoption and use of over-the-ear noise-cancelling headphones. The lockdowns associated with the COVID-19 pandemic allowed the implementation of a natural experiment to determine the effect of such workplaces on programming quality and productivity. As developers started working from home and some might be able to isolate themselves from noise and interruptions, we might find code with fewer faults and more complex constructs. Or hindered cooperation might be visible through more emails and online messages.

The objective of the proposed study is to examine these correlations, controlling as much as possible other factors, such as a distracting home environment.

References

Tom DeMarco and Timothy R. Lister. Peopleware: Productive Projects and Teams. Dorset House Publishing, 1987.

Implemented by others; see e.g. Pandemic programming and Challenges and Gratitude: A Diary Study of Software Engineers Working From Home During Covid-19 Pandemic.

Use neural network to determine processor microarchitecture

Modern cloud computing infrastructures have their hypervisors obscure the underlying processor microarchitecture from the client operating system. This happens for business reasons, but prevents client from determining whether they are vulnerable to side channel attacks. The proposed work will address this shortcoming by training a neural network to recognize diverse processor microarchitectures. The training data will stem from the values of processor performance counters after executing diverse workloads.

A study of bugs found in configuration management systems

Bug studies are important for understanding the nature of bugs which complex systems suffer from [1, 2]. This study aims to analyze and characterize previously reported bugs in popular configuration management systems (e.g., Puppet, Ansible, Docker, etc.). Specifically, the study should answer (some) of the following questions:

What are the symptoms and root causes of these bugs?
Can we identify common patterns?
What triggers these bugs?
What’s their impact?
How do developers fix these bugs?
Are these bugs system-depedent, i.e., do these bugs manifest regardless of the state of the underlying system?

The output of this study should be a set of useful findings that can guide the design of future automated techniques for detecting bugs in configuration management systems.

References

Chaliasos, Stefanos, et al. “Well-typed programs can go wrong: A study of typing-related bugs in JVM compilers.” Proceedings of the ACM on Programming Languages 5.OOPSLA (2021): 1-30.
A. Di Franco, H. Guo and C. Rubio-González, “A comprehensive study of real-world numerical bug characteristics,” 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), 2017, pp. 509-519, doi: 10.1109/ASE.2017.8115662.

A skeletal program enumeration algorithm for detecting soundness compiler bugs

Skeletal program enumeration (SPE) is an effective compiler testing technique, which works as follows. Given a program P with a specific syntactic structure, SPE enumerates all program variants that share the same structure as the original program P, but they expose different variable usage patterns. SPE has been applied in diverse compilers (i.e., GCC, LLVM, Dotty) and has found hundreds of crashes.

The goal of this program is to adapt the SPE algorithm to detect soundness compiler bugs. A soundness compiler bugs occurs when the compiler mistakenly compiles a wrongly-typed program. Detecting soundness bugs is highly important, as soundness bugs defeat the safety offered by type systems in statically-typed languages.

Given a program structure, the adapted SPE algorithm should perform a smart enumeration and produce program variants that although they share the same structure as the input, they exhibit (different kinds of) errors in different locations.

Mutation-based testing for ORM systems

A recent work [1, 2] has introduced a generator that produces queries used to test the correctness of ORM implementations via differential testing.

The goal of this project is to develop mutation-based testing approaches for uncovering bugs in ORM systems. The effectiveness of these mutation-based techniques will be evaluated in terms of bug-finding capability and code coverage improvement. In addition, one will compare the developed techniques against the existing query generator [3].

References

T. Sotiropoulos, S. Chaliasos, V. Atlidakis, D. Mitropoulos and D. Spinellis, “Data-Oriented Differential Testing of Object-Relational Mapping Systems,” 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 2021, pp. 1535-1547, doi: 10.1109/ICSE43902.2021.00137.
Cynthia

Detecting bugs in GNU Octave and Matlab

GNU Octave is an open-source platform for performing scientific and numerical computations. GNU Octave is aimed to be compatible with Matlab.

The goal of this work is to develop a testing approach for detecting (inconsistency) bugs in GNU Octave. Specifically, one will implement a program generator, suitably crafted for detecting numerical bugs in GNU Octave. To establish a suitable test oracle, one can use differential testing against the GNU Octave and Matlab implementations.

Which are the key infrastructure open source projects?

The study will analyze dependencies among open source projects (e.g. specified in Maven or npm builds, or through FreeBSD ports dependencies) in order to construct a graph. The importance of many nodes can be seeded through download count data from package managers such as Choco and Brew. These data will then be analyzed, using e.g. the PageRank algorithm, to determine which are the most important and critical projects. Furthermore, the projects will be assessed in terms of risk they pose to the community, based on characteristics such as the number of committers, licensing, security handling, product and process quality, issue management, and the freshness of commits.

Nicolas Harrand, Amine Benelallam, César Soto-Valero, Olivier Barais, Benoit Baudry Analyzing 2.3 Million Maven Dependencies to Reveal an Essential Core in APIs
Markus Zimmermann and Cristian-Alexandru Staicu Small World with High Risks: A Study of Security Threats in the npm Ecosystem USENIX Security ’19.

Implemented by others; see Libraries.io.

Software that matters

The goal of the proposed project is to create an authoritative data set of widely-used software packages, and document its creation process in a way that allows others to replicate the data. The data set that will be created can be used for mining software repositories to test software engineering research questions. A similar approach was used in the past to create a popular data set of software projects following specific engineering practices. The work is divided into five tasks.

Create a data set of popular software applications. Popularity should be measured in an objective and replicable way. For open source software packages popularity can be measured using download counts from popular software hosting platforms; for commercial software popularity can be obtained from market surveys, articles, and public data sets. The list should identify the software, its creator, its source code hosting platform (for open source software), and its main distribution point. Where possible, the list should be amended with the open source code packages (e.g. libraries) used by that software system. This can be obtained from its source code, by analyzing its binary for dynamic library imports, or by looking its documentation.
Create a data set of popular Debian packages.
Create a data set of popular JavaScript packages.
Create a data set of popular Python packages.
Create a data set of popular Java packages.

All tasks should start with a systematic search for related work and existing studies, and report on their findings through a summary with suitable references. Also, all tasks should provide their results (data sets) together with the mechanism that can recreate them, e.g. through scripting. Tasks 2—5, can be seeded with input from task 1, as well as their own measures of popularity. Data sets for tasks 2–5 should include whether a package is a stand-alone application or a library, its repository, and its measures of popularity and use. A key element in measuring use will be the transitive closure of use counts, by adding uses by dependencies and dependencies of dependencies. For example if a library a used by another library b that is used by 100 applications, then a will also have a use count of 100.

Implemented by others; see Libraries.io.

Qualitative analysis of software quality trends

Several studies have identified quantitative changes in several quality metrics of evolving software. Examples include the evolution of C programming practices in the Unix operating system, changes after a major OpenSSL security event, and the evolution of cyclomatic complexity in the Unix and GNU/Linux systems. The objective of this study is to perform a qualitative analysis of the underlying changes to determine a) what types of code changes led to the metric changes, and, b) what drove those code changes.

Implemented by others in 2020; see Piantadosi, V., Fierro, F., Scalabrino, S. et al. How does code readability change during software evolution?. Empir Software Eng 25, 5374–5412 (2020). DOI:10.1007/s10664-020-09886-9

The use and lifetime of URLs in source code comments

Comments in source code often reference web resources through URLs. The objective of this study is mine open source software repositories for URLs and perform quantitative and qualitative analysis on them in order to answer the following research questions.

What is the density of URLs in source code?
What types of resources are linked through source code URLs?
What are the purposes served by embedding URLs in source code?
What is the lifetime of source code URLs in terms of their validity (i.e. their resolution to a valid URL resource)?

The study will conclude with recommendations for developers and their managers regarding the use of URLs in source code comments.

Lawrence, S. and Giles, C.L. Accessibility of information on the Web. Nature 400 (1999), 107–109. doi:10.1038/21987
Diomidis Spinellis. The decay and failures of web references. Communications of the ACM, 46(1):71–77, January 2003. doi:10.1145/602421.602422

Implemented by others in 2019. See 9.6 million links in source code comments: purpose, evolution, and decay.

Υλοποίηση των αλγορίθμων ελέγχου φορολογικών κωδικών

Στόχος της εργασίας είναι η προσθήκη δημοσιευμένων αλγορίθμων ελέγχου φορολογικών κωδικών στο σχετικό άρθρωμα του εξαιρετικά δημοφιλούς πακέτου validator.js. Η εργασία είναι κατάλληλη για φοιτητές των τριών πρώτων ετών.

Υλοποιήθηκε το 2020.

Εμπειρική ανάλυση ετικετών που χρησιμοποιούνται σε αναφορές θεμάτων

Η εργασία θα αναλύσει τις ετικέτες που χρησιμοποιούνται σε αναφορές θεμάτων στο GitHub (GitHub Issue Tags). Στόχος είναι να απαντηθούν οι εξής ερευνητικές ερωτήσεις.

Ποιες ετικέτες χρησιμοποιούνται;
Γιατί χρησιμοποιούνται ετικέτες;
Πώς χρησιμοποιούνται οι ετικέτες;
Τι αποτελέσματα έχει η χρήση ετικετών;

Υλοποιήθηκε από τρίτους το 2021

Τι πρέπει να γνωρίζουν οι προγραμματιστές;

Στόχος της εργασίας είναι η χαρτογράφηση των γνώσεων που είναι απαραίτητες στους προγραμματιστές. Αυτό θα γίνει με τα εξής βήματα.

Επεξεργασία των κειμένων του ιστότοπου StackOverflow.com καθώς και λογισμικού ανοιχτού κώδικα για την εύρεση δεσμών στη Wikipedia.
Στατιστική ανάλυση της συχνότητας των δεσμών.
Ποιοτική ανάλυση των δεσμών με τη δημιουργία ιεραρχικού δένδρου και σύνδεσή τους με την αντίστοιχη κατηγοριοποίηση της ACM.

Υλοποιήθηκε το 2020 και έχει υποβληθεί προς δημοσίευση.

Μετάφραση κώδικα μεταξύ διαφορετικών γλωσσών προγραμματισμού με τη χρήση μηχανικής μάθησης

Στόχος της εργασίας είναι η εκμετάλλευση υποδομών μεταγλώττισης που υποστηρίζουν πολλαπλές γλώσσες προγραμματισμού, όπως το σύστημα LLVM, για την εκπαίδευση και μετάφραση κώδικα μεταξύ διαφορετικών γλωσσών προγραμματισμού· από μια γλώσσα-πηγή σε μια γλώσσα-στόχο, π.χ. από C σε Swift. Συγκεκριμένα, το προτεινόμενο σύστημα θα δουλεύει πρώτα με τη μεταγλώττιση του κώδικα από την πηγαία γλώσσα στην ενδιάμεση γλώσσα του μεταγλωττιστή. Στη συνέχεια, το σύστημα θα μεταφράζει τον κώδικα από την ενδιάμεση γλώσσα του μεταγλωττιστή στη γλώσσα στόχο με τη χρήση μηχανικής μάθησης. Το σύστημα μηχανικής μάθησης προτείνεται να βασίζεται σε νευρωνικά δίκτυα. Θα εκπαιδευτεί με βάση σώμα που θα δημιουργηθεί από τη μεταγλώττιση υπάρχοντα κώδικα από τη γλώσσα-στόχο στην ενδιάμεση γλώσσα LLVM.

Υλοποιήθηκε από τρίτους το 2020. Βλ. Unsupervised Translation of Programming Languages και Deep learning to translate between programming languages.

Προσθήκη στο πρόγραμμα AutoHotkey αυτόματης διόρθωσης γλώσσας πληκτρολογίου

Σας τυχαίνει να γράφετε στον υπολογιστή σας sta ellhnik;a, ςηιλε υου ςαντεδ το ςριτε ιν Ενγλιση (στα ελληνικά, while you wanted to write in English) ή το αντίθετο; Στόχος της εργασίας είναι η προσθήκη στον κώδικα του προγράμματος AutoHotkey της ανίχνευσης αυτού του προβλήματος (με τη στατιστική ανάλυση ν-γραμμάτων) και στη συνέχεια της αυτόματης διόρθωσης της γλώσσας με τη διαγραφή των λανθασμένων χαρακτήρων, εισαγωγή των σωστών και ρύθμιση του πληκτρολογίου στη σωστή διάταξη. Προτείνεται η προσαρμογή του προγράμματος να γίνει παραμετρικά, ώστε να μπορούν στο μέλλον τρίτοι να μπορούν να προσθέσουν και άλλα ζεύγη γλωσσών.

Υλοποιήθηκε από εμένα το 2020

Εύρεση μη δηλωμένων εξαρτήσεων σε συστήματα διαμόρφωσης υπολογιστών

Λογισμικό που επιτρέπει την αυτόματη διαμόρφωση υπολογιστών, όπως το Puppet, απαιτεί κάθε στοιχείο διαμόρφωσης (π.χ. η προσθήκη ενός αρχείου) να περιλαμβάνει τα προαπαιτούμενα (π.χ. την εγκατάσταση του αντίστοιχου λογισμικού). Σκοπός της εργασίας είναι η ανάλυση γράφων που προέρχονται από αρχεία καταγραφής της λειτουργίας τέτοιων συστημάτων για την εύρεση εξαρτήσεων που ενώ απαιτούνται, δεν έχουν στην πραγματικότητα δηλωθεί. Συνεπώς, το προτεινόμενο σύστημα θα προτείνει την προσθήκη των αντίστοιχων εξαρτήσεων, έτσι ώστε να αποφευχθεί το ενδεχόμενο αστοχίας της διαμόρφωσης.

Υλοποιήθηκε το 2019. Βλ. * Thodoris Sotiropoulos, Dimitris Mitropoulos, and Diomidis Spinellis. Practical fault detection in Puppet programs. In 42nd International Conference on Software Engineering, ICSE ’20, pages 26–37, 2020. doi:10.1145/3377811.3380384

Αντίστροφη μεταγλώττιση βασισμένη σε μηχανική μάθηση

Εκπαίδευση αλγορίθμου μηχανικής μάθησης με τον πηγαίο και μεταγλωττισμένο κώδικα λογισμικού, ώστε να μπορεί να υπολογίζει την αντίστροφη μεταγλώττιση άγνωστων προγραμμάτων.

Υλοποιήθηκε το 2018. Βλ. Java decompiler using machine translation techniques καθώς και την εργασία των Cheng Fu, Huili Chen, Haolan Liu, Xinyun Chen, Yuandong Tian, Farinaz Koushanfar, Jishen Zhao. A Neural-based Program Decompiler.

Μετρικές επιτυχημένων ιστότοπων και εταιριών

Στόχος της εργασίας είναι να ερευνήσει τη συσχέτιση ανάμεσα σε μετρικές του περιεχομένου ενός ιστότοπου και της επιτυχίας του ιστότοπου ή της αντίστοιχης εταιρίας. Μερικές μετρικές μπορεί να είναι οι παρακάτω:

Αριθμός λέξεων και μοναδικών λέξεων
Αριθμός και ανάλυση εικόνων
Αριθμός εσωτερικών και εξωτερικών δεσμών
Ορθότητα HTML
Ταχύτητα φόρτωσης της σελίδας

Οι σελίδες που θα ερευνηθούν θα είναι αυτές των ιστότοπων που εμφανίζονται στην κορυφή των Alexa rankings καθώς και των εταιριών Fortune 1000.

Αποδοτική παράσταση επιστημονικών γενεαλογικών δένδρων

Τα επιστημονικά γενεαλογικά δένδρα, όπως αυτό, έχουν δυο χαρακτηριστικά που επιτρέπουν την εμφάνισή τους με ιδιαίτερα αποδοτικό τρόπο από άποψη χώρου που καταλαμβάνεται στην εικόνα.

Οι περισσότεροι κόμβοι έχουν λίγους μητρικούς κόμβους (συχνότερα μόνο έναν)
Οι αποστάσεις συνδεδεμένων κόμβων μεταξύ διαφορετικών γενεών είναι μικρές (συνήθως μία).

Στόχος της εργασίας είναι η υλοποίηση ενός εργαλείου στο σύστημα Graphviz (αντίστοιχου με το dot ή το fdp) που θα επιτρέπει την αποδοτική παράσταση τέτοιων γράφων εκμεταλλευόμενο τα ειδικά τους χαρακτηριστικά. Για το σκοπό αυτό το εργαλείο θα χρησιμοποιεί όσο αποδοτικότερα γίνεται τα κενά που αφήνονται σήμερα από το εργαλείο dot σε κάθε γενεά, κρατώντας όμως στις συνδέσεις του γράφου μία κατεύθυνση στην οποία δεν θα υπάρχουν αντίστροφες συνδέσεις (από αριστερά προς τα δεξιά στο συγκεκριμένο παράδειγμα).

Στοχαστική βελτιστοποίηση διεπαφής CAD

Ένα συγκεκριμένο λογισμικό CAD για αρχιτέκτονες και πολιτικούς μηχανικούς έχει κατηγοριοποιημένες τις εντολές και τις παραμέτρους ανά οντότητα και τύπο. Στόχος της εργασίας είναι η ανάπτυξη τεχνικής για τη βελτιστοποίηση της κατηγοριοποίησης. Για το σκοπό αυτό θα πρέπει πρώτα να γίνει βιβλιογραφική επισκόπηση του χώρου. Στη συνέχεια πρέπει να κατασκευαστεί μοντέλο κόστους για την εκτέλεση των εντολών και την αλλαγή των παραμέτρων του υπάρχοντος προγράμματος. Τέλος, εφαρμόζοντας αλγορίθμους στοχαστικής βελτιστοποίησης πάνω στο υπάρχον μοντέλο κόστους και το ιστορικό πέντε εκατομμυρίων εντολών που δόθηκαν σε 56 χιλιάδες πραγματικές εκτελέσεις του προγράμματος, πρέπει να δημιουργηθεί και να αξιολογηθεί ένα νέο προτεινόμενο μοντέλο διάταξης των εντολών και να διατυπωθούν συμπεράσματα σχετικά με τη μέθοδο που ακολουθήθηκε και τα αποτελέσματά της.

Υλοποιήθηκε το 2018. Βλ. Alexander Lattas and Diomidis Spinellis. Echoes from space: Grouping commands with large-scale telemetry data. In 40th International Conference on Software Engineering: Software Engineering in Practice Track, ICSE-SEIP ’18, New York, NY, USA, May 2018. Association for Computing Machinery. doi:10.1145/3183519.3183545

Οσμές ποιότητας σε αρχεία Word/Excel/Powerpoint

Οι οσμές ποιότητας (quality smells) μπορούν να χρησιμοποιηθούν για να ανιχνεύσουν κακή ποιότητα σε ένα ψηφιακό έγγραφο. Για παράδειγμα σε αρχεία Word μπορεί να είναι η απουσία χρήσης προτύπων (styles), σε αρχεία Excel η μη χρήση ονομάτων και σε αρχεία Powerpoint η μορφοποίηση χωρίς τη χρήση πρότυπων διαφανειών (master slides). Η εργασία

θα συλλέξει ελεύθερα διαθέσιμα έγγραφα (π.χ. μέσω αναζήτησης Google, ή μέσω των δεδομένων της Enron)
θα διατυπώσει κατάλογο οσμών για τον τύπο εγγράφων που θα ερευνηθεί και
θα ερευνήσει την ύπαρξή τους στο σώμα των εγγραφών που θα συλλεχθεί.

Εξέλιξη της ασφάλειας στον πηγαίο κώδικα του Unix

Με βάση ένα αποθετήριο που καλύπτει 45 χρόνια της εξέλιξης του πηγαίου κώδικα του λειτουργικού συστήματος Unix η εργασία θα ερευνήσει πώς εξελίσσονται πιθανές ευπάθειες του κώδικα που θα μπορούσαν να οδηγήσουν σε κενά ασφάλειας. Η ανίχνευση των ευπαθειών θα γίνει με τη χρήση σχετικών εργαλείων στατικής ανάλυσης κώδικα, όπως του Flawfinder.

Υλοποιήθηκε το 2019. Βλ. Charalambos Mitropoulos. 2019. Employing different program analysis methods to study bug evolution. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019). ACM, New York, NY, USA, 1202-1204. DOI: 10.1145/3338906.3342489 και την αντίστοιχη μεταπτυχιακή εργασία.

Γραφική διεπαφή για εντολές του Unix

Οι εντολές που προσφέρονται στη γραμμή εντολών του Unix είναι εξαιρετικά ισχυρές. Όμως η διαμόρφωσή τους με παραμέτρους είναι δύσκολη και απαιτεί συχνές αναφορές στην τεκμηρίωσή τους, διότι είναι δύσκολο κανείς να θυμάται τις διαθέσιμες παραμέτρους και τα ονόματά τους.

Στόχος της εργασίας είναι η δημιουργία μιας γραφικής διεπαφής για τη διαμόρφωση των παραμέτρων της κάθε εντολής. Αυτή θα εμφανίζει κατάλληλα ομαδοποιημένες και τεκμηριωμένες τις διαθέσιμες παραμέτρους, επιτρέποντας στο χρήστη να επιλέξει αυτές που χρειάζεται. Με βάση τις επιλογές του χρήστη, η διεπαφή θα σχηματίζει δυναμικά την εντολή με τη μορφή που πρέπει να εκτελεστεί.

Η γραφική διεπαφή θα διαμορφώνεται για κάθε εντολή βάσει ενός αρχείου διαμόρφωσης γραμμένου σε μια απλή γλώσσα εξειδικευμένου πεδίου (domain specific language) που θα περιγράφει ποιες είναι οι διαθέσιμες παράμετροι, τι κάνει η κάθε μία και πώς μπορούν να συνδυαστούν. Η αρχική μορφή του αρχείου αυτού θα μπορεί να δημιουργείται από τον πηγαίο κώδικα της εντολής και της τεκμηρίωσής της.

Υλοποιήθηκε το 2018. Βλ. το σχετικό αποθετήριο στο GitHub και την αντίστοιχη μεταπτυχιακή εργασία.

A tool for reproducible research

Annotate text with data that can be reproduced
Identify and mark primary sources
Use taint checking and static analysis to identify data that come from other sources versus that that was calculated on the fly.
Use cloud for long term storage

Most of this functionality is provided by R Reports and the Python Notebooks. See: Shen, Helen, Interactive Notebooks: Sharing the Code, Nature, 515(7525):151–152, 2014}, DOI: 10.1038/515151a, and Ten Simple Rules for Reproducible Research in Jupyter Notebooks.

Empirical investigation of merge conflicts

Why do they occur?
Can they be avoided (e.g. through more frequent merges)?

Αποδοτική αποσφαλμάτωση «πίσω στο χρόνο» με την τεχνολογία Intel PT

Η τεχνολογία Intel PT επιτρέπει την καταγραφή της ροής εκτέλεσης των εντολών ενός προγράμματος με μικρό κόστος σε απόδοση. Στόχος της εργασίας είναι η υλοποίηση (μέσω προσαρμογής του υπάρχοντα κώδικα ενός αποσφαλματωτή, όπως ο GDB) αποσφαλμάτωσης «πίσω στο χρόνο» (back in time debugging) με τη χρήση της τεχνολογίας αυτής. Αν και η αποσφαλμάτωση δεν θα επιτρέπει την πρόσβαση σε μεταβλητές, θα επιτρέπει την απόλυτα ακριβή παρακολούθηση της ροής του προγράμματος, με απόδοση τέτοια που πιθανώς να μην απαιτεί ενεργοποίηση από το χρήστη. Για την μείωση της απαιτούμενης μνήμης προτείνεται η δυναμική συμπίεση των δεδομένων που καταγράφονται. Επισκόπηση σχετικών τεχνικών υπάρχει σε αυτό το άρθρο.

Η σχετική δυνατότητα υποστηρίζεται πλέον από την εντολή record pt του GDB.