Jun 5, 2018

How to count the number of lines of code in a PHP project

Series

This is a series of posts on Enterprise Laravel.

!
Warning: This post is over a year old. I don't always update old posts with new information, so some of this information may be out of date.

I'm giving a talk soon about Laravel and "the enterprise", and the concept of LOC (lines of code) keeps coming up. It turns out that's actually a much harder number to discover than you might think, so I figured I would write up a few options here.

For what it's worth, I'm not a big fan of LOC as a measure of any importance, but it can at least give us some broad foundations to use to talk about broad differences in project size. If you were to ask me, I would say we shouldn't even think about it. But we don't always have that luxury.

TL;DR: use PHPLOC

If you don't want to read a half dozen options, use PHPLOC. You can find a longer description below, but here's the quick start guide:

cd Sites/
wget https://phar.phpunit.de/phploc.phar
php phploc-4.0.1.phar --exclude vendor --exclude node_modules myprojectnamehere/

Grab the Non-Comment Lines of Code and Logical Lines of Code numbers; they'll be your most useful comparisons across projects.

Note that you can also exclude framework-specific cache and log directories and whatever else helps you get the best number.


OK, let's look into all of the options. Please note that some of these tools count all lines of code, not just PHP. When possible, I've passed filters to them to just count PHP files.

Installable command-line tools

PHPLOC

PHPLOC is a project from Sebastian Bergmann, the creator of PHPUnit, which gives simple and easy LOC counts contextualized for PHP.

PHPLOC also gives other PHP-specific metrics, like cyclomatic complexity, number of classes, average class length, average method length, and more.

Here's a sample PHPLOC report:

$ phploc src
phploc 4.0.0 by Sebastian Bergmann.

Directories                                          3
Files                                               10

Size
  Lines of Code (LOC)                             1882
  Comment Lines of Code (CLOC)                     255 (13.55%)
  Non-Comment Lines of Code (NCLOC)               1627 (86.45%)
  Logical Lines of Code (LLOC)                     377 (20.03%)
    Classes                                        351 (93.10%)
      Average Class Length                          35
        Minimum Class Length                         0
        Maximum Class Length                       172
      Average Method Length                          2
        Minimum Method Length                        1
        Maximum Method Length                      117
    Functions                                        0 (0.00%)
      Average Function Length                        0
    Not in classes or functions                     26 (6.90%)

Cyclomatic Complexity
  Average Complexity per LLOC                     0.49
  Average Complexity per Class                   19.60
    Minimum Class Complexity                      1.00
    Maximum Class Complexity                    139.00
  Average Complexity per Method                   2.43
    Minimum Method Complexity                     1.00
    Maximum Method Complexity                    96.00

Dependencies
  Global Accesses                                    0
    Global Constants                                 0 (0.00%)
    Global Variables                                 0 (0.00%)
    Super-Global Variables                           0 (0.00%)
  Attribute Accesses                                85
    Non-Static                                      85 (100.00%)
    Static                                           0 (0.00%)
  Method Calls                                     280
    Non-Static                                     276 (98.57%)
    Static                                           4 (1.43%)

Structure
  Namespaces                                         3
  Interfaces                                         1
  Traits                                             0
  Classes                                            9
    Abstract Classes                                 0 (0.00%)
    Concrete Classes                                 9 (100.00%)
  Methods                                          130
    Scope
      Non-Static Methods                           130 (100.00%)
      Static Methods                                 0 (0.00%)
    Visibility
      Public Methods                               103 (79.23%)
      Non-Public Methods                            27 (20.77%)
  Functions                                          0
    Named Functions                                  0 (0.00%)
    Anonymous Functions                              0 (0.00%)
  Constants                                          0
    Global Constants                                 0 (0.00%)
    Class Constants                                  0 (0.00%)

You can require it globally or project-specific with Composer, or, my preferred method, just download the .phar, run it, then delete it once you're done.

Here's what I used:

php phploc.phar --exclude vendor --exclude node_modules myproject

CLOC

CLOC is one of the longer-running and smartest programs for counting lines of code. It can differentiate languages and also separate empty lines and comment lines against real lines of code.

It can also pull from archives and git repositories, diff two versions of a codebase, pull from specific commits, ignore files and folders matching specific patterns, and it's installable via Brew, NPM, two Windows package managers, and all the major Linux package managers.

Here's an example:

prompt> cloc gcc-5.2.0/gcc/c
      16 text files.
      15 unique files.
       3 files ignored.

https://github.com/AlDanial/cloc v 1.65  T=0.23 s (57.1 files/s, 188914.0 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C                               10           4680           6621          30812
C/C++ Header                     3             99            286            496
-------------------------------------------------------------------------------
SUM:                            13           4779           6907          31308
-------------------------------------------------------------------------------

Because CLOC is language-agnostic, it's not going to provide the same quality or diversity of metrics as PHPLOC.

Here's what I used:

cloc --exclude-dir=vendor,node_modules myproject

IDE-specific

PHPStorm

I found a plugin for PHPStorm called Statistic that gives you the total number of lines of code across your whole project and broken down by file type.

Screenshot of PHPStorm Statistic Plugin

Sublime Text

I found this Gist, which harnesses the regex capabilities of Sublime Text search, and makes it easy to specify which file types and folders you want to include or exclude.

I used this version (from a comment, which ignores white space lines):

^.*\S+.*$

Make sure to exclude the right directories. Here's my list for a generic PHP project:

-./vendor/*,-./node_modules/*,-./.git/*,*.php

Manual terminal commands

The find command

This is definitely one of the less precise measures, but it also doesn't require you to have anything else installed, and it gives you the ability to include and exclude specific patterns for files and folders.

find . -type f ! -path './vendor/*' ! -path './node_modules/*' ! -path './.git/*' ! -name '*.log' -name '*.php' | xargs wc -l

As you can see, we're excluding the two vendor directories, the git directory, and then you can also see an example of how to exclude and include specific file patterns.

Thanks to Jake Bathman at Tighten for helping me get this command working correctly.

Silver Searcher (ag)

If you have Silver Searcher installed, you can try this:

ag -l --php --ignore-dir=vendor --ignore-dir=node_modules --ignore-dir=public --ignore-dir=storage | xargs wc -l

Thanks to Daniel Coulbourne at Tighten for this one.

FAQ

  • Should I include vendor directories?
    In my opinion, no.
  • Should I include comments and white spaces?
    In my opinion, no.
  • Is LOC a good indicator of complexity or significance?
    In my opinion, no.
  • What's cyclomatic complexity?
    You can learn more by reading Wikipedia's article, but in essence it's a measure that attempts to see how complicated a codebase is. It's fun to look at, but also necessarily increases as the size of a project increases, so it's not really an easy-to-use number, and there have been suggestions that it doesn't really correlate with a higher number of bugs. So... is it fun? Yes. Useful? In my opinion, no. Some complexity numbers (average lines per method and class, for example) can be more useful at times. If you care, you can get this these complexity numbers from PHPLOC and another tool called PhpMetrics; take a look at this Laravel News post on code complexity to learn more.
  • What are logical lines of code?
    Also know as "effective lines of code", these are the lines of code which actually instruct the parser to do something. As far as I can tell, this is to differentiate them against templates, headers, comments, and ineffective code; additionally, multiple instructions on the same line should be parsed as multiple LLOC.

LOC results from each tool on the same project

I ran these tools all on the same project: Symposium, one of Tighten's open source projects, to see how they all compare.

PHPLOC

wget https://phar.phpunit.de/phploc.phar
php phploc-4.0.1.phar --exclude vendor --exclude node_modules symposium/

Result:

  • LOC: 13,609
  • Comment Lines of Code: 2,431
  • Non-comment Lines of Code: 11,178
  • Logical Lines: 2,958

CLOC

brew install cloc
cloc --exclude-dir=vendor,node_modules symposium/

Results:

  • LOC: 26,068
  • Comments: 4,296
  • PHP LOC: 7,653
  • PHP Comments: 2,444

PHPStorm

Jose Soto ran these for me.

Results:

  • LOC: 52,168
  • PHP LOC: 12,096
  • PHP source code LOC: 7,849
  • PHP comment LOC: 2,406

Note: In order to get this plugin to give me good results, Jose had to delete the vendor/ and node_modules/ diprectories.

Sublime Text

With regex enabled, find ^.*\S+.*$ in:

-./vendor/*,-./node_modules/*,-./.git/*,*.php

And then look at the results at the bottom.

Results:

  • PHP LOC: 11,468

Manual find command

cd symposium
find . -type f ! -path './vendor/*' ! -path './node_modules/*' ! -path './.git/*' ! -name '*.log' -name '*.php' | xargs wc -l
  • PHP LOC: 13,609

Silver Surfer (ag)

cd symposium
ag -l --php --ignore-dir=vendor --ignore-dir=node_modules | xargs wc -l

Results:

  • PHP LOC: 11,795

Conclusion

Wow. This post took way longer than I expected. Kudos to you for reading this long. Geez. I am tired.

In the end, I'd still recommend PHPLOC if you can. It is the most contextualized and provides additional details several others don't. It makes it easy to exclude vendor directories. It's good. That's all.


Comments? I'm @stauffermatt on Twitter


Subscribe

For quick links to fresh content, and for more thoughts that don't make it to the blog.