The "it would be nice if..." I had in mind (again, don't know if it's practical) is a trackable measure of site responsiveness to external requests. E.g., "average milliseconds to deliver a new page during the past 10 minutes" or similar. One could then use standard control charts, etc., to determine when something abnormal needs fixing, and/or track the effects of configuration changes to determine whether they were good, bad, or indifferent.
Google Analytics by default records full page load time in 10% of requests, given that the page loaded (and the user has JavaScript enabled). A review of the source of the forum pages reveals that Analytics is already in use, so this data already is available for review. Of course, this doesn't include any data points from pages that didn't load far enough for the JavaScript to run, but they are captured by other metrics (also available through Google services that the site appears to already be using).
As an unprivileged user, you can make an HTTP request to the site from time to time and measure the response time, but the difficulty is that even under good conditions, there will be significant variance in response time, so in order to smooth that out, you would need to make a large number of requests. The Google Analytics data avoids that problem since it's being collected on pages that are being loaded by real users anyway, and thus it imposes no additional burden while collecting tons of data. This data is already being collected.
Since we can't review the existing Analytics data, starting from your post earlier today, I requested the forum index every two minutes (which is very infrequent and would have no perceivable effect on site load, but also isn't enough to make any decent statements about average response time), and I found that on six occasions, it was down (defined as either the forum index doesn't load at all, or it takes more than 10 seconds to load), namely
| Fri May 1 12:02:11 MDT 2015 Fri May 1 12:42:11 MDT 2015 Fri May 1 12:50:12 MDT 2015 Fri May 1 13:14:12 MDT 2015 Fri May 1 17:12:11 MDT 2015 Fri May 1 17:14:11 MDT 2015
|
From this data I think we can draw the conclusion that the site is frequently down even during a good day. I don't expect to be able to draw many other conclusions from it though, although I will keep collecting it and perhaps post a graph if this problem is still ongoing in a few weeks, perhaps a histogram showing the percentage of downtime experienced each day, or during each one hour period of a day. Hopefully the problem will have been resolved by then by MMM, his current team, and perhaps even other contributors such as you, Syonyk, and myself.
That all said, I still anticipate that there will be a lot of low-hanging fruit making the resolution a straightforward project.