Code Structure Analysis
Code Structure Analysis
Home| Services| C++ Grammar | Examples of Analysis | Contacts|
Services:
The list of services constantly evolves and extends.

Description of the services

This page describes different services that we offer. Other services, related to source code analysis and improving its structural integrity and quality are also possible. Ask for a quote using the contacts page. The work is typically done remotely under the strict non-disclosure agreement under the US law. In some cases, it is possible to do the work directly on the customer site. In all cases, the customer representative reviews and accepts the changes before these changes are checked in.



Rarely used language features

Every programming language has many different features. Some of them are used actively while some are almost unknown. Developers are familiar with actively used features. This means that average developer will most likely understand the code and fix/modify it without a problem if it contains only well known language constructs. Rarely used features can be a source of a problem. Not only developer will spend more time with such code, but compiler may fail to compile it. Programming standards in the compiler industry are high but still, compilers are not written by gods. As the feature is used less often, it is likely that a compiler would contain a bug, not necessarily in the current version of the compiler. Problem can appear after porting the project to another platform or upgrading to a new compiler version.

Production code should serve the business logic. It should not be a compiler test and it should not raise the bar for software engineers who will support the code later. Unfortunately, there is no golden rule here. Certain language features can be fine for one team of developers while the same language features may create problems for other teams. The bar can be adjusted per customer requirement.

Our experience tells that scanning the source code for possible use of the weird features, especially in the macro-processing, can improve the code.

Dead code identification

As the program grows, unused parts tend to appear. Modern linkers can solve part of the problem. They are not including functions that are never called into the executable. Linker will not help with unused enum members, unused parameters of the functions, or the like. Removing unused code saves time because developers have to look at everything what is in the source file. Unused code is not highlighted there in any way.

Dead code identification can be used to evaluate the completeness of the test sets for libraries that provide API. When the code of the library is scanned together with the test sets and the scanning reveals that some function, static variable, class, and the like are not used, it may mean that there is no test case for this function. Using automated dead code detection and semi-automated dead code removal can put the entire procedure into affordable price range.

Comments spell check

Like any other text in the human language, the comments may contain spelling errors. None of the available software development tools has a built-in spell checker. When the source code is used only inside the company and comments contain errors, it is typically not a big deal. Nevertheless, source code can be licensed to a third party or it may become public domain, etc. It makes sense to spell check the comments before giving the source code outside. In a good company, the comments spell checks should be run on a regular basis.

Strings and messages spell check

Every software product has messages. Some of them are used very often. These messages do not contain spelling errors, and they sound properly. At the same time, the same software package comprises messages that are used rarely. The testers may not ever see them. This is where the analysis toolkit can help. It can easily extract all strings and run them through the spell checker.

Function names and variable names spell check

The industry is using meaningful names for functions, classes, and variables, among others. This makes sense. At the same time, it also makes sense to enforce correctness when using words in the names of the functions. The use of abbreviations is common. This will appear as a spelling error. Well, not a big deal. Just ignore it or add to the list of known abbreviations. In many cases, at least from our experience, spelling mistakes originate from using clipboard. The name of the function can be typed once and never be retyped or checked again. Spell check catches such things.

Custom algorithmic analysis

This analysis can differ in complexity. For example, one of the customers asked to verify certain aspects in their string tables that were added to the executable as resources. Any pattern in the code can be traced and checked. Here is an example of the error messages pattern. An error message should be declared as the enum member in one file. There should be a text of the message somewhere else (maybe in several different languages), and this error should be reported somewhere at least once. When the number of errors or messages exceeds 200, an automated system that tracks this pattern starts to make sense. This will guarantee constant consistency between different places in the source code that cannot be verified with compiler. Important point is the amount of effort that is required to build and maintain such system. Our company has appropriate infrastructure for building such things.

More complex analysis is possible. This should be discussed in details with the customer.



Conditional compilation simplification

It happens that source code contains support for old hardware or software platforms, extensions created for specific customer, or experimental pieces of code that never went into production. In many cases, these pieces of code are disabled using the conditional compilation. Sometimes, code in these conditional areas is simply broken. Developers spend time trying to figure out whether they should modify this piece of code. These pieces litter the source code. Our company provides effective solution for removing the unneeded conditional compilation branches.

In the first step, it is necessary to review existing conditional compilation keys. One way of doing this is described in the report. After deciding what keys should be removed, a special converter is used. This converter evaluates each conditional compilation expression. If expression contains only those parameters that should be removed, the expression evaluates to a constant. If this constant is FALSE, the whole #ifdef ... #endif construct is removed. If the constant is TRUE, only the conditional statements that surround the code are removed while the code between them stays intact. If an expression cannot be reduced to a constant, both the conditional statements and the code between stay. The real procedure is slightly more complex. Actually, a conditional expression can be simplified if it contains a mixture of removable and non-removable parameters; #elif statement can be converted into #else or into #if if the conditional area above #elif disappears; #if can be converted into #ifdef, etc.

This procedure was used many times and proved to be reliable and practical. The results of conversion are always presented to the customer for review before the final check in.

Custom refactoring

Custom refactoring is an extremely wide area. One of examples is when the software product uses several different memory allocation systems or several different error-reporting systems, tracing systems, synchronization primitives, and the like. It is always beneficial to converge the product to use only one such subsystem. It turns out that a refactoring script can be created with a reasonable effort. The script will search for all places where the target subsystem is used and modify the code in the right direction. Our infrastructure allows updating the source code without breaking the code formatting style.



Design your own code style

Many articles and books are championing the style of the source code as a way to get excellent software. These ideas were especially popular about 20 years ago. One of such well known systems is Hungarian notation. The time passed, the trumpets are silent now. In general, systems like Hungarian notation have not delivered what they claimed to deliver. Ease of reading and quality of the software are not just in giving the right prefix to the name of the variable. Nevertheless, style regulations contain something positive.

It is easier to read the source code that has a familiar formatting style. We argue that there is no "right style" for the source code. Each team of developers can have its own style. Style of the code includes many aspects, like using tabs or spaces, adopting naming convention for functions or variables, and the like. It can govern how the curly braces should be placed, whether there should be a space after the comma, around the plus sign, etc. The presence of the disclaimer in the beginning of the file is also part of the style. Files from different directories may require different disclaimers.

Formatting styles are often introduced by the manager of the team of one of the influential developers. Once the style has emerged, maybe it is better to keep it. The source code is created not only inside the team of developers. From time to time, the code comes from outside. It is nice to convert the style of this code or maybe just fix the most offending aspects. We practice defining a style in the form of answering a questionnaire. As an aid, we can provide a report on the existing ratio between different styles of identifiers in the source code, using tabs and the like. Based on the answers, the style check scanner is tweaked. It reports violations of the defined style.

Convert existing code to a company style

Unfortunately, a fully automated procedure is not possible here. This issue depends on the complexity of the style, diversity of the sources, and the distance between the current style and the desired style. A combination of an automated procedure with a bit of manual editing shows acceptable results. We have a procedure that works together with the source code versioning system. Converter generates changes in the code. After that, a diff procedure is used and the changes are reviewed. Most likely, the first attempt will be rejected for whatever reason. Version control system makes the rollback. Maybe the desired style is adjusted. The style converter is tweaked and the conversion is rerun. After several attempts, the changes can be presented to the customer. This procedure with multiple conversion attempts proved to be practical and convenient.

Regularly verify the code style conformance

The style checker is finally given to the customer. It makes sense to run style conformance verification on a regular basis, maybe once a week. It is better to integrate this utility into the build process. Integration with the bugs tracking system used by the customer can be discussed.



C/C++ front end

The C/C++ parser from our toolkit generates clear and easy to use data structures. We also have prototype for non-optimizing code generator for an abstract stack based execution machine. The front end itself can be licensed. One of the advantages of our front-end is that its code is new and does not have a decade long history of changes that result in a code that is difficult to understand and modify. Our front end is compact and clean.

Compiler test sets

We have a set of compiler test files that were used during development of our C++ font end. These files focus on forcing compiler to go through specific parsing states according to the parsing table and resolve specific conflicts of the grammar. For example Microsoft C++ compiler has the following problem. Template can have a parameter that is a class template. This parameter can have a default value. When the source code specifies a name of the function template as the default value, Microsoft compiler does not generate any error. This is an example of a bug that can be discovered by routinely checking all possible "one step" deviations from the grammar rules.

Test sets can be developed on demand to address specific type of language construct or compiler feature. In some cases these tests will be generated automatically based on the grammar, in some cases they will be written by hand.



Grammar development for custom languages

Custom languages are widely used in various applications. Primary focus of our company is grammar development when the customer has a good idea about what language he/she needs or when he/she already has an informal description of the language. Nevertheless, it is possible to discuss participation in the development of the language requirements and the language itself.

Grammar conflicts resolution development

Relatively simple languages typically do not have conflicts. Well-known tools, like YACC/Bison, are well suited for building front-ends for such languages. Languages that are more complex, like C++, do have conflicts. Grammar resolution techniques are required to develop front-end for such languages. Our company has experience in this area. We provide analysis that proposes strategies for resolving conflicts. One of our success stories involved analysis of conflicts that helped to rework and completely remove conflicts from the grammar. After that, our customer simply used Bison.

Complier front-end development for custom languages

Complexity of this work greatly depends on the language. This should be discussed directly with the customer.




Experience shows that the complexity of an analysis task is often significantly smaller compared to the customer expectations. Unfortunately, the opposite sometimes also happens.

Ask for a non-obligation free quote using the contacts page.