Plugin Downloads Statistics: recent changes
The Plugins Repository serves more than 200,000 downloads of plugins every 24 hours, which can easily double or triple during the major releases. It was starting to become quite a challenge for us to keep, process and display all the data since 2012, which is when we started counting downloads continuously and keeping all the history.
As the IntelliJ Platform matured, the number of plugins and their downloads drastically increased. Our old statistics mechanism (keeping everything on the Plugins Repository web application side in a MySQL database) started failing us, and there were some drawbacks in the user interface and the calculations, as well as some performance issues.
As an effort to improve the experience, we have completely reworked the way we handle plugin downloads statistics. The major changes included:
- Using CDN access logs as the data source instead of an application-side event;
- Shifting ownership of the data to the internal JetBrains statistics team;
- Updating the user interface.
Changing the source: CDN logs vs. application-recorded event
We used to capture a download event on the Plugins Repository side both for downloads from within the IDE and from the web interface which could lead to performance problems (e.g. a lot of downloads being served simultaneously required us to batch the updates of the stats to the database).
Switching to CDN logs as the source for download statistics means that we can capture each download event individually, so it won’t be lost if there was an issue with a server connection or any other outage.
This switch means that there might be some minor changes in monthly downloads pattern from mid-December 2017, as we changed some of the calculation rules. For example, if the IDE downloads a plugin more than once (for example, the initial download couldn’t be installed due to a network connection problem), it will now be counted as multiple downloads, while before only one plugin download from a unique IDE within 10 minutes would be counted in the statistics. This should only be a minor variance, however.
Please note that the latency for calculating the statistics is up to 24 hours, and the processing is routinely done every early morning by European time, so during the current day, you get statistics for yesterday.
Extracting the statistics part: external application to calculate and provide statistics
Extracting the download statistics into a dedicated application allowed us to utilize the resources of JetBrains’ internal statistics team, who will process plugin download logs and calculate download metrics using well-maintained processes and dedicated infrastructure, as they do for JetBrains’ own products statistics.
This has helped improve the general performance of the statistics page, as now all the data is prepared outside of the application and delivered to the plugins repository via a light-weight API on demand.
As a side note: plugin downloads statistics include downloads of removed updates. These numbers would not be visible on per-update downloads statistics.
Changing the UI
Long story short – we have switched to Highcharts, a SVG-based, multi-platform charting library. We will be able to improve our statistics presentation UI and UX with this important component rather than spending our time maintaining our own custom component.
Our statistics charts now support live zooming and annotations. They are lightning-fast, responsive & mobile ready.
On the statistics improvements roadmap
The initial support for new statistics was released in mid-December 2017, and we are on our way to deliver even more, such as:
- Breakdown by plugin download type (new or update, to get unique downloads);
- Filter by country;
- Yearly granularity;
- Filter by build;
- Filter / split by channel (stable, EAP, etc);
- Minor UI/UX fixes on statistics pages, and more.