How to count the number of lines of code in a PHP project
This is a series of posts on Enterprise Laravel.
!Warning: This post is over a year old. I don't always update old posts with new information, so some of this information may be out of date.
I'm giving a talk soon about Laravel and "the enterprise", and the concept of LOC (lines of code) keeps coming up. It turns out that's actually a much harder number to discover than you might think, so I figured I would write up a few options here.
For what it's worth, I'm not a big fan of LOC as a measure of any importance, but it can at least give us some broad foundations to use to talk about broad differences in project size. If you were to ask me, I would say we shouldn't even think about it. But we don't always have that luxury.
TL;DR: use PHPLOC
If you don't want to read a half dozen options, use PHPLOC. You can find a longer description below, but here's the quick start guide:
cd Sites/
wget https://phar.phpunit.de/phploc.phar
php phploc-4.0.1.phar --exclude vendor --exclude node_modules myprojectnamehere/
Grab the Non-Comment Lines of Code
and Logical Lines of Code
numbers; they'll be your most useful comparisons across projects.
Note that you can also exclude framework-specific cache and log directories and whatever else helps you get the best number.
OK, let's look into all of the options. Please note that some of these tools count all lines of code, not just PHP. When possible, I've passed filters to them to just count PHP files.
Installable command-line tools
PHPLOC
PHPLOC is a project from Sebastian Bergmann, the creator of PHPUnit, which gives simple and easy LOC counts contextualized for PHP.
PHPLOC also gives other PHP-specific metrics, like cyclomatic complexity, number of classes, average class length, average method length, and more.
Here's a sample PHPLOC report:
$ phploc src
phploc 4.0.0 by Sebastian Bergmann.
Directories 3
Files 10
Size
Lines of Code (LOC) 1882
Comment Lines of Code (CLOC) 255 (13.55%)
Non-Comment Lines of Code (NCLOC) 1627 (86.45%)
Logical Lines of Code (LLOC) 377 (20.03%)
Classes 351 (93.10%)
Average Class Length 35
Minimum Class Length 0
Maximum Class Length 172
Average Method Length 2
Minimum Method Length 1
Maximum Method Length 117
Functions 0 (0.00%)
Average Function Length 0
Not in classes or functions 26 (6.90%)
Cyclomatic Complexity
Average Complexity per LLOC 0.49
Average Complexity per Class 19.60
Minimum Class Complexity 1.00
Maximum Class Complexity 139.00
Average Complexity per Method 2.43
Minimum Method Complexity 1.00
Maximum Method Complexity 96.00
Dependencies
Global Accesses 0
Global Constants 0 (0.00%)
Global Variables 0 (0.00%)
Super-Global Variables 0 (0.00%)
Attribute Accesses 85
Non-Static 85 (100.00%)
Static 0 (0.00%)
Method Calls 280
Non-Static 276 (98.57%)
Static 4 (1.43%)
Structure
Namespaces 3
Interfaces 1
Traits 0
Classes 9
Abstract Classes 0 (0.00%)
Concrete Classes 9 (100.00%)
Methods 130
Scope
Non-Static Methods 130 (100.00%)
Static Methods 0 (0.00%)
Visibility
Public Methods 103 (79.23%)
Non-Public Methods 27 (20.77%)
Functions 0
Named Functions 0 (0.00%)
Anonymous Functions 0 (0.00%)
Constants 0
Global Constants 0 (0.00%)
Class Constants 0 (0.00%)
You can require it globally or project-specific with Composer, or, my preferred method, just download the .phar
, run it, then delete it once you're done.
Here's what I used:
php phploc.phar --exclude vendor --exclude node_modules myproject
CLOC
CLOC is one of the longer-running and smartest programs for counting lines of code. It can differentiate languages and also separate empty lines and comment lines against real lines of code.
It can also pull from archives and git repositories, diff two versions of a codebase, pull from specific commits, ignore files and folders matching specific patterns, and it's installable via Brew, NPM, two Windows package managers, and all the major Linux package managers.
Here's an example:
prompt> cloc gcc-5.2.0/gcc/c
16 text files.
15 unique files.
3 files ignored.
https://github.com/AlDanial/cloc v 1.65 T=0.23 s (57.1 files/s, 188914.0 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
C 10 4680 6621 30812
C/C++ Header 3 99 286 496
-------------------------------------------------------------------------------
SUM: 13 4779 6907 31308
-------------------------------------------------------------------------------
Because CLOC is language-agnostic, it's not going to provide the same quality or diversity of metrics as PHPLOC.
Here's what I used:
cloc --exclude-dir=vendor,node_modules myproject
IDE-specific
PHPStorm
I found a plugin for PHPStorm called Statistic that gives you the total number of lines of code across your whole project and broken down by file type.
Sublime Text
I found this Gist, which harnesses the regex capabilities of Sublime Text search, and makes it easy to specify which file types and folders you want to include or exclude.
I used this version (from a comment, which ignores white space lines):
^.*\S+.*$
Make sure to exclude the right directories. Here's my list for a generic PHP project:
-./vendor/*,-./node_modules/*,-./.git/*,*.php
Manual terminal commands
The find
command
This is definitely one of the less precise measures, but it also doesn't require you to have anything else installed, and it gives you the ability to include and exclude specific patterns for files and folders.
find . -type f ! -path './vendor/*' ! -path './node_modules/*' ! -path './.git/*' ! -name '*.log' -name '*.php' | xargs wc -l
As you can see, we're excluding the two vendor directories, the git directory, and then you can also see an example of how to exclude and include specific file patterns.
Thanks to Jake Bathman at Tighten for helping me get this command working correctly.
Silver Searcher (ag
)
If you have Silver Searcher installed, you can try this:
ag -l --php --ignore-dir=vendor --ignore-dir=node_modules --ignore-dir=public --ignore-dir=storage | xargs wc -l
Thanks to Daniel Coulbourne at Tighten for this one.
FAQ
- Should I include vendor directories?
In my opinion, no. - Should I include comments and white spaces?
In my opinion, no. - Is LOC a good indicator of complexity or significance?
In my opinion, no. - What's cyclomatic complexity?
You can learn more by reading Wikipedia's article, but in essence it's a measure that attempts to see how complicated a codebase is. It's fun to look at, but also necessarily increases as the size of a project increases, so it's not really an easy-to-use number, and there have been suggestions that it doesn't really correlate with a higher number of bugs. So... is it fun? Yes. Useful? In my opinion, no. Some complexity numbers (average lines per method and class, for example) can be more useful at times. If you care, you can get this these complexity numbers from PHPLOC and another tool called PhpMetrics; take a look at this Laravel News post on code complexity to learn more. - What are logical lines of code?
Also know as "effective lines of code", these are the lines of code which actually instruct the parser to do something. As far as I can tell, this is to differentiate them against templates, headers, comments, and ineffective code; additionally, multiple instructions on the same line should be parsed as multiple LLOC.
LOC results from each tool on the same project
I ran these tools all on the same project: Symposium, one of Tighten's open source projects, to see how they all compare.
PHPLOC
wget https://phar.phpunit.de/phploc.phar
php phploc-4.0.1.phar --exclude vendor --exclude node_modules symposium/
Result:
- LOC: 13,609
- Comment Lines of Code: 2,431
- Non-comment Lines of Code: 11,178
- Logical Lines: 2,958
CLOC
brew install cloc
cloc --exclude-dir=vendor,node_modules symposium/
Results:
- LOC: 26,068
- Comments: 4,296
- PHP LOC: 7,653
- PHP Comments: 2,444
PHPStorm
Jose Soto ran these for me.
Results:
- LOC: 52,168
- PHP LOC: 12,096
- PHP source code LOC: 7,849
- PHP comment LOC: 2,406
Note: In order to get this plugin to give me good results, Jose had to delete the
vendor/
andnode_modules/
diprectories.
Sublime Text
With regex enabled, find ^.*\S+.*$
in:
-./vendor/*,-./node_modules/*,-./.git/*,*.php
And then look at the results at the bottom.
Results:
- PHP LOC: 11,468
Manual find
command
cd symposium
find . -type f ! -path './vendor/*' ! -path './node_modules/*' ! -path './.git/*' ! -name '*.log' -name '*.php' | xargs wc -l
- PHP LOC: 13,609
Silver Surfer (ag
)
cd symposium
ag -l --php --ignore-dir=vendor --ignore-dir=node_modules | xargs wc -l
Results:
- PHP LOC: 11,795
Conclusion
Wow. This post took way longer than I expected. Kudos to you for reading this long. Geez. I am tired.
In the end, I'd still recommend PHPLOC if you can. It is the most contextualized and provides additional details several others don't. It makes it easy to exclude vendor directories. It's good. That's all.
Comments? I'm @stauffermatt on Twitter
This is part of a series of posts on Enterprise Laravel: