Welcome to Code Structure Analysis
The purpose of this site is to promote the speed of the software development and the quality of the software.
This can be achieved through understanding better the source code and dependencies in it. In our days,
as the software industry matures, the focus shifts from writing new code to extending and modifying the existing code.
It is relatively easy to modify your own, recently written code. Modifying the old code or somebody else's code is much
more difficult. Today big teams are working on huge source code bases. Developers have to deal with unknown code.
What was true for the code base yesterday may be not true for the same code today. Therefore, it is essential
to be able to get an accurate snapshot of the current state of the code. Analysis of the source code is often called
"static analysis". This is a correct term, although it often refers to looking at various flaws, especially
the security flaws. Looking for flaws is an important application. Other applications are also possible.
In fact, the source code analysis is needed for everybody who is involved in the software development.
It is needed for regular developer who looks for a better way to modify some particular piece of code as well as
for development manager who evaluates the complexity of the planned big scale changes and for tester who checks
the completeness of the existing test suite or thinks about the ways to test a particular feature.
When speaking about algorithmic analysis, many people think that "this is proving the invariant of the loop".
Loop invariant has definite scientific value. At the same time, it is completely separated from the real life.
New approaches are needed to redefine the concept of algorithmic analysis and bring it closer to everyday needs.
No simple definition of the source code analysis exists. The result of algorithmic analysis is any knowledge that can
be obtained from the source code, that is, various things that may differ drastically in complexity.
Here are some examples:
-
Find places in the comments where some particular function is mentioned.
-
Check whether certain function can be called recursively (directly or indirectly).
-
Figure out the external conditions that are needed for a certain line of code to be executed.
-
Check whether current test suite uses all conditional compilation branches of the shipped product or not.
-
Find places where race conditions may happen when the code is executed on multiple threads.
This list is not complete. There are as many different types of analysis as there are different needs in the software development.
This means that there cannot be ultimate tool that would serve all possible needs. An appropriate environment is needed
that would allow creating analysis tools at a reasonable cost. This is the subject of this web site.
An infrastructure for automatic code modification or refactoring is also needed. This is tied to analysis.
For example, when it is necessary to modify the list of parameters of the method, it becomes necessary to understand where
this list of parameters is located in the source code. Our toolkit provides various refactoring features.
At the core of the algorithmic analysis stays the parsing of the source code of the project and its code model.
To get an idea of what does this means, look at a small C++ sample program:
#define En(x) Abcd ## x
struct S1
{
template<class p1, int p2> class C2;
enum En(5);
friend class C3;
};
template<class p1, int p2>
class S1::C2
{
p1 m_data[30];
int Method16(int x) const { return((x > 16) ? x : p2); }
template<template <int> class p3, class p4> struct S4;
};
enum S1::En(5)
{
a, b = 20, c, d
};
template<class p1, int p2>
template<template <int> class p3, class p4>
struct S1::C2::S4
{
short M17(p1*);
short M18(p4*);
};
This is the same program after a basic processing:
|
|
|
1 |
|
#define En(x) Abcd ## x |
2 |
|
|
3 |
|
struct S1 |
4 |
|
{ |
5 |
|
template<class p1, int p2> class C2; |
6 |
|
enum En(5); |
7 |
|
friend class C3; |
8 |
|
}; |
9 |
|
|
10 |
|
template<class p1, int p2> |
11 |
|
class S1::C2 |
12 |
|
{ |
13 |
|
p1 m_data[30]; |
14 |
|
int (int x) const { return((x > 16) ? x : p2); } |
15 |
|
template<template <int> class p3, class p4> struct S4; |
16 |
|
}; |
17 |
|
|
18 |
|
enum S1::En(5) |
19 |
|
{ |
20 |
|
a, b = 20, c, d |
21 |
|
}; |
22 |
|
|
23 |
|
template<class p1, int p2> |
24 |
|
template<template <int> class p3, class p4> |
25 |
|
struct S1::C2::S4 |
26 |
|
{ |
27 |
|
short M17(p1*); |
28 |
|
short M18(p4*); |
29 |
|
}; |
30 |
|
|
|
|
|
Somebody might think that the difference is only in the highlighted syntax. This is true to a certain extent.
Only the complexity of the syntax highlighting procedure can be very different. Highlighting keywords in a different
color is a simple task. Highlighting local variables in one color and data members of the class in another color is much
more complex because it requires understanding the meaning of each name.
This meaning can be retrieved only from the compiler front end tables.
Look at the top layer contents of the C++ database of definitions for this sample:
|
S1
/struct/
|
|
C2
<class, __int32>
|
| |
m_data
/p1 [30]/
|
| |
Method16
(__int32)
|
| |
S4
<template, class>
|
| | |
M17
(p1*)
|
| | |
M18
(p4*)
|
|
Abcd5
/enum/
|
C3
/class/
|
|
This is a quick overview of the structure of the sample program. This picture contains only the top layer objects.
Other less important objects, like parameters of the methods, bodies of the functions, and the like, are not displayed
for simplicity. More detailed picture can also be viewed:
[+] Show details.
|
S1
/struct/
| 100 |
|
C2
<class, __int32>
| 101 |
| |
p1
| 102 |
| |
p2
/__int32/
| 103 |
| |
p1 [30]
| 107 |
| |
m_data
/p1 [30]/
| 108 |
| |
Method16
(__int32)
| 109 |
| | |
x
/__int32/
| 110 |
| | |
Block
| 111 |
| | | |
Return-Stmt
| 112 |
| | | | |
Conditional-Expr
| 113 |
| | | | | |
Binary-Expr
| 114 |
| | | | | | |
Operand-Expr
| 115 |
| | | | | | |
Operand-Expr
| 116 |
| | | | | |
Operand-Expr
| 117 |
| | | | | |
Operand-Expr
| 118 |
| |
S4
<template, class>
| 119 |
| | |
p3
| 120 |
| | | |
/__int32/
| 121 |
| | |
p4
| 122 |
| | |
M17
(p1*)
| 129 |
| | | |
/p1*/
| 130 |
| | |
p4*
| 131 |
| | |
M18
(p4*)
| 132 |
| | | |
/p4*/
| 133 |
| |
p1*
| 128 |
|
Abcd5
/enum/
| 104 |
| |
a
/0, 0x0/
| 123 |
| |
b
/20, 0x14/
| 124 |
| | |
Operand-Expr
| 125 |
| |
c
/21, 0x15/
| 126 |
| |
d
/22, 0x16/
| 127 |
|
FriendSpec
| 106 |
__int32
| 19005451 |
C3
/class/
| 105 |
__int16
| 19005449 |
|
It would be nice to postpone the discussion of clarity and completeness of these diagrams, although they may not be that bad.
Simplicity, clarity, and completeness of the API that provide access to the underlying structure of the code are more important.
This API allows writing various analysis scripts that look for certain things or verify various aspects of the code.
The complexity of these scripts varies. Our toolkit allows writing these scripts in a reasonable time. This means that it is possible
to write scripts to conduct valuable analyses for customers on demand. This website presents examples of such analyses.
This web site is not selling analysis tools. The tools are never perfect, and they require effort to learn how to use them.
In the real world, the customer does not know in advance whether an advertized analysis/refactoring tool will solve his problem or not.
This knowledge can be gained only after getting the tool and spending time with it. The required amount of time and the result are
both unclear. Our company decided to use alternative approach, i.e., to propose a consultancy service. In this case,
the customer learns in advance whether his problem can be solved and has realistic estimation of the complexity of the problem.
The customer is paying for actual results.
|