Jump to content

Welcome Back DarkMatters - Sept/27th/2018 - What happened?!?


Recommended Posts

And we... are back!! Thank you to you our big family/community who's weathered it through with us to make it back to the boards.  The struggle has been nothing short of just awful...but Schot has managed to find a way. :hugs:  Our Dbases are all secure after the massive failure of the hosting company at MDD Hosting that has been keeping us safe for the almost last ten years, and we're all now here at NameCheap servers.  One of the largest and kindest run companies in the world ( :bounce: )  

MDDHosting had a very huge and very massive fail of it's servers after one of it's very very VERY senior employees pressed the wrong button.  Almost everything was lost and they came down to the last backup server, with all the sites going down for days, and ours being on the last server to come back online which VERY frustrated ust.  

Here's the first and second notes from them...they're a good piece of Drama Bon apetit :

Quote

 

Posted 22 September 2018 - 10:58 AM

While I was hoping to save some of this for the official RFO [Reason For Outage] - enough people are getting tremendously upset over this that I'm going to spell out what I can now - keeping in mind that I will provide more details when I can.

**What happened?**

First and foremost - this failure is not something that we planned on or expected.  A server administrator, the most experienced administrator we have, made a big mistake.  During some routine maintenance where they were supposed to perform a _file system trim_ they mistakenly performed a _block discard_.

**What does this mean?**

The server administrator essentially told our storage platform to drop all data rather than simply dropping data that had been marked as _deleted_ by our servers.

**Why is restoration taking so long?**

Initially we believed that only the primary operating system partition of the servers was damaged - so we worked to bring new machines online to connect to our storage to bring accounts back online.  Had our initial belief been correct - we'd have been back online in a few hours at most.

As it turns out our local data was corrupted beyond repair - to the point that we could not even mount the file systems to attempt data recovery.

Normally we would rely on snapshots in our storage platform - simply mounting a snapshot from prior to the incident and booting servers back up.  It would have taken minutes - if maybe an hour.  We are not sure as of yet, and will need to investigate, but snapshots were disabled.  I wish I could tell you why - and I wish I knew why - but we don't know yet and will have to look into it.

We are working to restore cPanel backups from our off-site backup server in Phoenix Arizona.  While you would think the distance and connectivity was the issue - the real issue is the amount of I/O that backup server has available to it.  While it is a robust server with 24 drives - it can only read so much data so fast.  As these are high capacity spinning drives - they have limits on speed.

Our disaster recovery server is our **last resort** to restore client data and, as it stands, is the _only_ copy we have remaining of all client data - except that which has already been restored which is back to being stored in triplicate.

**What will you do to prevent this in the future?**

We have, as we've been working on this and running into issues getting things back online quickly, discussing what changes we need to make to ensure that this both doesn't happen again as well as that we can restore quicker in the future should the need arise.  I will go into more detail about this once we are back online.

**We are sorry - we don't want you to be offline any more than you do.**

Personally I'm not going to be getting any sleep until every customer affected by this is back online.  I wish I could snap my fingers and have everybody back online or that I could go into the past and make a couple of _minor_ changes that would have prevented this.  I do wish, now that this has happened, that there was a quick and easy solution.

I understand you're upset / mad / angry / frustrated.  Believe me - I am sitting here listening to each and every one of you about how upset you are - I know you're upset and I am sorry.  We're human - and we make mistakes.  In this case **thankfully** we do have a last resort disaster recovery that we can pull data from.  There are _many_ providers that, having faced this many failures - a perfect storm so to speak - would have simply lost your data entirely.

This is the **first** major outage we've had in over a decade and while this is definitely major - our servers are online and we are actively working as quickly as possible to get all accounts restored and back online.  For clarity - the bottleneck here is not a staffing issue.  We evaluated numerous options to speed up the process and unfortunately short of copying the data off to faster disks - which we did try - there's nothing we can do to speed this up.  The process of copying the data off to faster disks was going to take just as long, if not longer, than the restoration process is taking on it's own.

Once everybody is back online - and there are accounts coming online every minute - we will be performing a complete post-mortem on this and will be writing a clear and transparent Reason For Outage [RFO] which we will be making available to all clients.

I hope that you understand that while this restoration process is ongoing there really isn't much to report beyond, "Accounts are still being restored as quickly as possible."  I wish there was some interesting update I could provide you like, "Suddenly things have sped up 100x!" but that's not the case.

I am personally doing my best to reach out to clients that have opened tickets are updated as to when their accounts are in the active restoration queue.  While we do have thousands of accounts to restore - our disaster recovery system actually transfers data substantially faster with fewer simultaneous transfers.  While it sounds counter-intuitive - we're actively watching the restoration processes and balancing the number of accounts being restored at once against the performance of the disaster recovery system to get as many people back online as quickly as possible.

Most sites are coming back online after restoration without issues, however, if once your account is restored you are still having issues - we are here to help.  While we are quite overwhelmed by tickets like, "WHY IS THIS NOT UP YET!?!?!"  "WHY ARE YOU DOWN SO LONG!?!:)!!"  "FIX THIS NOWWWW!" - we are still trying to wade through all of that to help those that have come back online and are having issues - as few and far between as it has been.

If you have any questions - we will definitely answer them - but please understand that while we're restoring accounts we're really trying to focus on the restoration of services as well as resolving issues for those that are already resolved.

Again - I am sorry for the trouble this is causing you - we definitely don't want you offline any more than you do and will have all services restored as quickly as we can.

 

 

  • Like! 1
Link to comment

And here..is the doozie...this is the one that made us start seething and had Schot start looking to get us transferred to a new home where he has stationed many of his webwork clients:

Quote

 

Hello,

First and foremost I want to apologize again for any issues caused to you by this extended outage. If you are still offline we are working to restore your services as quickly as possible. I am also sorry if we are not responding to your support ticket as quickly as we normally would. This outage has been an absolute nightmare and something that I honestly never envisioned would happen even though we've always done our best to plan and to try to expect the unexpected.

I do understand that downtime is unacceptable and not being able to reecover from a disaster quickly is not acceptable.

We are a small company and we have never pretended otherwise and have always been proud of the services and support that we've given to our clients. I founded this company in 2007 after personally being upset with the fact that there were no good hosting providers that didn't ignore you, give you copy and paste answers, and provide services that were unreliable or offline more than not.

I will be honest in that I have had several companies over the years attempt to buy us out and the offers were always good. The reason I've never sold this company is because I know that if I hand this company and our clients over to somebody else - the quality of service and support will not be maintained. I have been in this industry long enough and seen enough sales to know what happens to the clients of a company that gets sold and that's not something that I want to see happen.

My personal goal is that when one of my two sons, presently 3 and 6, are older that one of them will want to work with me at this company and that we will always remain a family business even when I may not be directly involved in all day to day operations. We're not a huge corporation or owned by one and we would very much like to stay that way.

Since 2007 we have had a pretty solid track record. We did experience a 72 hour outage in 2008 due to the data center our server [yes, one] was in catching fire and experiencing an explosion. Even then - when we were so small - that was a stressful and exhausting experience. I still remember how I felt helpless to resolve the issue and to do anything for our clients.

I have always made sure that we had backups of client data and, in many cases, backups of our backups. I have seen over the years that issues can and will happen and that it's only a matter of time. Google and AWS have both had issues and they invest millions, if not billiions, into making sure they have no downtime. We don't have anywhere near their budget but we have always invested in reliabilty but have never been foolish enough to believe that we were completely isolated from downtime and unexpected issues.

We moved to our new StorPool powered storage cluster last year and the platform has been absolutely amazing. This outage is in no way related to the platform and there is a major feature of the platform that, if we were using it properly, would have allowed us to restore from this incident extremely quickly and that is called 'snapshotting'. The honest truth of the matter is that I knew that snapshotting was available and I thought that we were making use of it as a first line of protection against major outages and issues when it comes to data. I do not personally manage the storage platform as that is a bit out of my skill set but we do have an administrator that manages the storage and StorPool also oversees the storage and has always been there within minutes to help us when we need it.

As a part of managing our storage cluster we do have to keep tabs on the total amount of storage available on the cluster as well as the total free space. On Friday afternoon we received an alert from StorPool monitoring that we were getting low on SSD space so I reached out to the administrator that handles our storage and asked him to make sure that our servers were doing what is called "discarding". On distributed storage like this when a server deletes data it doesn't actually physically overwrite the data but sends a command to the storage platform letting it know that the block of data is no longer in use. This generally happens automatically but there are some situations where it won't happen automatically and we do have to issue a manual file system trim.

The administrator that handles our storage, at my request, began to look over the cluster to make sure that everything was good to go. I had discussed that we needed to add more solid state disks to the cluster to increase our capacity but did want him to run a manual file sytsem trim to make sure we weren't wasting any space. Keeping in mind that a trim simply makes sure that the operating system running your server has communicated all deleted blocks to the storage platform. The administrator performing this work intended to run a "fstrim" [file syste trim] to remove any extra blocks but actually ran a "blkdiscard" [block discard]. This is no easy mistake to make as these commands are entirely different and perform different tasks. A block discard, by default, discards all blocks on a device regardless of whether they hold important data, file system data, or anything else.

The administrator very quickly realized the huge mistake they had made and did what he could to immediately cease the block discard and to preserve what data he could. I suppose you could say the incredible speed of our storage cluster worked against us in this case the discard was able to drop enough data even in a few seconds to essentially corrupt everything we had stored. This is where snapshots would have saved us.

If we were using snapshots as I thought we were all we'd have had to do is shut down all of the client servers, mounted a snapshot prior to the block discard, and then booted the servers up. There would have been a small amount of lost time/data due to rolling back to a prior snapshot but we would have been back online within minutes. Snapshotting simply keeps track of data changed on the storage platform from the time that the snapshot is taken until the time the changes are meged into that snapshot so that a new one can be created. In essence all changes after the snapshot can be discarded or ignored in an emergency to bring everything back to an earlier state without major impact.

I do not have an explanation as to why snapshots were not in use as this is the primary and first line of defense. It was my understanding that snapsots were in use and this is one of the big, and surprisingly simple, changes we are making moving forward. Having the ability to roll individual servers or the whole storage cluster back to a time just prior to a major incident will give us the power to restore from any normally unrecoverable errors or issues quickly and efficiently with minimal impact.

The administrator that made this mistake has been working with me and doing his best to help us recover from this incident. At this moment he is actually taking a few hours off as he has gotten to the point of a total anxiety attack and has been working to hold himself together since the incident occurred. He's distraught because he knows that he could have avoided this mistake by simply sending the correct command to the system and doesn't have an explanation for the mistake beyond that he ****** up. This wasn't malicious in nature and as sad as it is to say is human error. I do not know if it was carelessness or a simple lapse in clear thinking but regardless of why - we are where we are now.

I have been in this industry long enough both as a customer initially as well as a provider to see how important it is to have backups of your data. I have seen all too often providers that go completely out of business after losing all of their client data and clients that were attacked by malware or performed a bad update and didn't take a backup first. I've also seen hosting clients of other providers lose everything they've worked on for years or even decades due to not keeping their own backups and a provider losing their data. These are things I've always worked to make sure would not happen at our company.

For about a year we have had a backup server in a data center in Phoenix, Arizona. Customers have had access to this backup server to conduct their own restorations of their files, databases, email accounts, etc and it has been a great convenience. The idea behind this backup server is that in an absolute worst-case scenario, such as the primary facility being destroyed by a natural disaster, we would have a safe copy of all client data that could be used to restore services in another facility. This server holds 14 copies of all client data and has been great when a client needed to restore something back a day, a week, or a couple of weeks ago.

I have been in this industry long enough both as a customer initially as well as a provider to see how important it is to have backups of your data. I have seen all too often providers that go completely out of business after losing all of their client data and clients that were attacked by malware or performed a bad update and didn't take a backup first. I've also seen hosting clients of other providers lose everything they've worked on for years or even decades due to not keeping their own backups and a provider losing their data. These are things I've always worked to make sure would not happen at our company.

Now here we stand in the biggest disaster my company has faced in over a decade. The longest outage and the most stressful situation - all due to a simple wrong command on a keyboard sent once a couple of days ago. Due to oversight on our part or sheer ignorance we are not able to simply mount a snapshot and recover within minutes. This is something that we are going to change as soon as we have some time to devote to making it happen.

We have been doing our best to keep up with all support tickets opened, tweets sent, posts on our forums, etc. I will be honest in saying that we are getting new support tickets opened at the rate of a few every few seconds - far faster than we can keep up with. I am sure that this adds on to the frustration of the situation when you open a ticket and don't get your normal nearly immediate response from our support staff. I am sure some of you feel like you've been ignored or like we're simply doing a poor job or ignoring the issue. I can assure you that although our support time responses are far longer than normal it's not because we aren't doing our best. We haven't walked away and we aren't ignoring you and we are taking the time to do our best to address each individual concern personally and not to send pre-defined replies whenever possible.

The backup server that we have in Phoenix is stuffed full of regular spinning hard drives to give us the capacity we need to hold a copy of all data. Due to the number of drives we needed for the capacity we decided to use some compression to both reduce the total storage footprint needed as well as to speed the system up. The idea was that as less data had to be read or written to the disks due to compression that we could trade off CPU for the compression for space on the storage. This has worked well over the year for normal one-off restorations and nightly backups of everything.

When we set this server up we did get a copy of all data to it over the course of a few weeks. This isn't because we couldn't have done it faster but simply because we were trying to avoid using an inordinate amount of bandwidth. On our level we don't pay for data transfer by the amount transferred but by how long we use a large amount of bandwidth. We can, for example, use 1,000 megabit per second every second for a month without paying any overages but if we were to use 2,000 megabit for 3 days our bandwidth bill would double. We did perform testing on the server to make sure that it was capable of receiving, handling, and sending large amounts of data quickly. In short we wanted to make sure that in the event that we needed to conduct a restoration that it would perform up to our requirements.

Over the last year we did not have cause to perform any large restorations or operations. I do not, as of right now, have an explanation as to why our backup server has gone from fast to slow. I do not know if it's some sort of fragementation of the data storred, if it has to do with how our backup system performs incremental copies to keep multiple restoration points, or something else. I do wish I had some sort of explanation to provide but I haven't been able to figure one out and have not been able to figure out a good way to get it to go faster. The bottleneck in this recovery is the disk storage in this backup server.

Initially when the mistake happened and the administrator realized he had discarded data he thought it had only been performed on the operating system device - that client data was safe and intact. Some of the servers were still online and serving requests although they were experiencing issues and would crash in short order. The initial belief was that we could reconfigure and reinstall our operating systems on our client machines and then simply reconnect client storage and, while we'd have a lot of tedious things to fix and address, we'd recover fairly quickly and with minimal to no client data loss.

It took 2 or 3 hours for us to reprovision all client machines, to install the operating systems, control panels, and all software and settings necessary for the servers to perform their jobs and to serve data for our clients. We do perform this from time to time on a new server as we need more capacity and to be honest I think we did a reasonable job of getting as many servers as we needed configured and online as quickly as we did. I thought that was going to be the end of the major outage and the beginning of days or weeks of fixing small issues and glitches here or there and I was terribly wrong.

We found that when we went to mount the client storage to the servers that the devices wouldn't mount, the file systems were damaged, or the data stored was corrupted to the point that we couldn't simply recover from it. I'm sure if we were a substantially larger company with a much larger budget it may have been possible to find some way to recover some of this data. One of the primary reasons I was glad, even during such an event, to think that we were going to be able to bring these disks online was to avoid any extended downtime and issues for our clients.

Unfortunately we were wrong and the data was not usable. This meant that we were now not only in a state of disaster but that the disaster was not going to be over nearly as quickly as we had originally thought. If we were restoring a single server it would have been done in a few hours and, while that would have not been fun for anybody, it wouldn't have taken very long. Unfortunately we are in a disaster scenario where we are having to restore all servers. No matter how we slice it - be it restoring all servers at once slower, or restoring one server at a time a bit quicker - the speed of overall recovery is about the same.

I know that many of our clients have been extremely frustrated with our inability to provide what is felt as a "simple ETA," and I wish that we were able to do so. One of the issues with this backup system that we are using is that the performance is extremely inconsistent. We may be able to restore data extremely quickly for a few minutes and then the data rate will drop to an extremely low rate for a while. We may restore 5 accounts in a few minutes and then it may take an hour to restore the next account in the queue. Another issue with this is that our backup and restoration software doesn't give us any insight into the process. We can't see backed up accounts by their size so that we can prioritize small accounts to restore as much service as quickly as possible. We can't see how fast the transfer of an account is going, how long it's taken, how long it has remaining, or anything but the account name itself and that it is in progress.

Every time I have tried to generate a realistic ETA that ETA would change from minute to minute or hour to hour - and I have never been a fan of giving incorrect information. The truth is that I hadn't gotten any sleep since we went down until just a few hours ago. I have been sitting at my computer working on restoring services and helping clients from the moment the issue occurred until I was no longer able to peform my duties and then I did so for many hours more doing my best. I am sorry if you feel we haven't communicated effectively due to delays in responding to support tickets or an inability to tell you when you would be back online or how long this issue is going to take to resolve.

Due to the issues with the backup server that we are restoring from we have decided to move in a new direction when it comes to restoring services. It is already clear that this is going to be an extended outage and that no matter how much we want things to be back online quickly that we are limited by this backup server and its capabilities. One of the larger issues is that we can perform a single stream or data copy fairly quickly but the second we need to do 2, or 3, or 4, or more at the same time they all slow to a crawl and come nowhere near the capacity of a single data copy.

When we are restoring using our backup software there is a data stream for each account on each server. We have 12 client servers we need to restore and even if we go one account at a time we're looking at a minimum of 12 data streams. Normally in such an instance we would be restoring 4 to 16 accounts per server to get things restored as quickly as possible. Normally we would expect to saturate a 1 GBPS or 2 GBPS networking link and as soon as we determined that we needed to restore from this sytem we got with the facility hosting the backup server and requested they swap in 10 GBPS networking. We really did believe that we were going to saturate the 1 GBPS link in the server during these restorations due to the number of streams of data we were going to need to support.

Unfortunately we found that we were not even able to use 25% of 1 GBPS much less to come anywhere near saturating 1 GBPS or even touching 10 GBPS. Some have asked us why we didn't relocate the backup server closer to where we are conducting the restorations - such as flying it over. The simple answer is that it's not the connectivity between them that is the problem and causing delays. Even if the backup server were sitting right next to the servers being restored the data transfer rate issue would still exist.

Due to the fact that there is no quick recovery from this and that we get much faster single-stream throughput than multi-stream throughput we are replicating the whole server backups off of this storage and onto solid state raid arrays one at a time. We are getting between 1 GPBS and 4 GBPS data transfer rates on this single stream which is orders of magnitude more than we have been getting trying to restore directly from the backup server to the client servers. The downside is that this means there is an intermediary step that is going to take time before we can actually perform any restorations. For example the transfer that is running right now is going to take an approximate total of 6 hours to run and is about 3.5 hours into that run.

Once each stream copy of a server's backups is done we are going to remove the solid state array from the backup server and locate it into a new chassis with 10 GBPS networking to conduct restorations from Phoenix to Denver as quickly as possible. For comparison the restoration of our S1 server via the normal restoration process from the backup server could take days or even a week where as this process we believe will allow us to restore the server as a whole within hours.

The only real upside to the slower restoration process of restoring accounts directly from the backup server is that, as each account is finished, that account comes online. So even if it were to take days - there would be accounts online and recovered within the first hour, first day, etc. With the process we're running now - during this 6 hour copy - nobody new is going to be coming online. The upside to the route we've chosen to take is that once this 6 hour copy is done client accounts from that backup are going to be coming online substantially faster with a shorter overall downtime for everybody.

I am personally extremely sorry that this has happened and that you have experienced these issues. I am doing my absolute best to hold everything together and to restore services. Once we are recovered from this incident there are a great many changes that we are going to be making to our backup system as well as to our overall policies and procedures concerning data security and recovery. The most immediate change is going to be that we will be using snapshots on our storage platform to protect individual servers as well as the whole network from major catastrophe. We also plan on having a secondary redundant copy of these snapshots/our data local to the storage cluster.

Data storage isn't cheap and the storage platform we are running is expensive before you take into account that we store three copies of every piece of data in our storage platform to protect against drive and server failure. We could lose more than half of our storage servers or drives and our storage cluster would remain online and operational. This means to store 20 TB of data we need more than 60 TB of actual storage - tripling the cost of the storage itself. Another factor that makes this so expensive is that we rely on enterprise class storage and not your standard consumer grade off-the-shelf storage. Even for all of this additional cost we are going to bring online a storage platform capable of holding a local copy of our data as an extra layer of protection beyond the snapshots we will be performing.

We will also be overhauling and replacing our off-site disaster recovery backup systems. We will most likely move to having several smaller servers with high performance storage - each handling one or two client servers - rather than one big behemoth of a backup server. Although we are planning and putting steps in place to ensure we never get to the point of needing to do a disaster recovery like this again it would be remiss for us not to plan for it regardless. Should we ever have to perform disaster recovery from our off-site location in the future this setup will allow us to sustain numerous high speed restorations without any one specific bottleneck.

The downside to this is that all of it, short of the snapshots we're enabling, is going to be extremely expensive. Now is obviously not going to be a good time for us to be spending money as I know that there are going to be a lot of our customers that have lost faith in us over this incident. I know that we're going to have cancellations and more than likely we're going to lose a lot of sales and new revenue over this incident. It is unfortunate that at the time that we really need the revenue to invest in making sure we are protected from a major incident like this in the future is the time that we are likely going to suffer the most when it comes to revenue.

If you are losing money, sales, clients, or anything else due to this outage I am extremely sorry. I want nothing more than to restore all services and to get everybody back online. The truth is that I didn't start this company to get rich and I don't do it for the money but becasue I genuinely enjoy providing a solid service and quality support. I enjoy providing a hosting platform that our clients are happy with and providing support that is above and beyond what other provdiers can or are willing to provide. While I know for a fact that we will get all data restored and that it's only a matter of time - I do hope that you have not lost hope in us and that you understand that we realize we majorly ****** up. We ****** up on more level than one in this situation and we know it and are going to be making changes to protect you as well as us from any further incidents like this.

We have been in business since 2007 and other than the data center fire early on we have had no major issues or incidents. We've had a server issue here or there but have always been able to recover within, at most, a few hours. We've been in business almost 12 years or 4,250 days with exceptional uptime, reliability, and support. I do understand that we have been down, at this point, for nearly 38 hours. I hope that you can see that while this is a long time that in the grand scheme of things we've been reliable and that we are going to do everything in our power to make sure that we stay reliable and online moving forward.

We have about 20 TB of total data to restore and it's going to take us about 2 hours per terabyte of data to copy the data off of our disaster recovery backup server in Phoenix onto the solid state storage arrays. As we do not have to copy the entire 20 TB of data completely before we can begin restoring from the solid state storage we will be begining high speed restoration of services in as soon as 2 hours from now. As we have a server's data copied over we will begin restorations.

We will be performing these copies in the order that the machines were provisioned. Ultimately there is no way for us to copy all machines at once, or we would do that, so we had to decide how we were going to handle this. As we have added machines as we've added new customers we want to restore service to our oldest clients that have been with us the longest first. Please don't misunderstand this to say that we don't wish that we could restore data for everybody all at once or that we value older clients more than newer ones. We simply had to choose a way to decide which servers got restored first and no matter how we go about it there are going to be clients that experience longer outages than others.

As we go through this process I will be providing status updates on our forums at https://forums.mddho...92118-09222018/. I will also be answering every support ticket that I can personally, responding on Twitter and Facebook. I will send an email to you if there is something important we need to share where we can't rely on you checking other means of communication like our forums - but most centralized communication is going to happen on our forums on that thread. 

As soon as we begin conducting a restoration of the data from the higher speed storage we're copying our backups to I will do my best to provide an accurate ETA for the restoration of services on a server-by-server basis. Until we begin that restoration process anything I provided would be nothing more than a guess and I don't want to mislead by providing inaccurate information. I know that it has been frustrating not to have an ETA and as soon as I am able to provide one you will have it.

Any replies to this email will come directly into the management department which I am handling entirely on my own. Please keep in mind that I am also handling regular tickets so I will do my absolute best to respond to you directly as I get a chance. There is a good chance that over the next several days I am going to be so overwhelmed with support tickets, email, social media, and the like that my responses may come with a fair bit of delay but I will get back to you as soon as I can.

Sorry for such a long message as I wanted to make sure that I was detailed and that I was able to give a clear picture of where we stood, why we stood there, and what we are doing to get back where we need to be.

I am sorry that we made mistakes that landed us in this situation and that we have dropped the ball and ****** up. I hope that you can give us the chance to recover from this and to continue providing you the solid and reliable service and support that I know we can provide and that we strive for every minute of every day.

I am sorry, I really am. I am going to make sure that it is made right even if it takes longer than anybody would like and I will be personally available as much as I can be both until we are fully back online as well as for an extended period after for any feedback, questions, comments, or concerns you may have.

I hope that you are able to give us the chance to continue hosting your account once we are able to recover from this.

Thank you for taking the time to read my message.

Sincerely,

Michael Denney
MDDHosting LLC

 

0

█ Michael Denney - MDDHosting LLC - Providing Hosting since 2007
█ Scalable shared hosting plans in the cloud! Check them out!
█ Highly Available Cloud Shared, Reseller, and VPS
█ http://www.mddhosting.com/

 

 

Link to comment

Working in IT I can say there's no excuse for what happened.  The two commands that got swapped are nothing alike in form or function and there's no reason to run the one they did unless they wanted to wipe everything.  I can't even fathom how that happened.

But, not going to dwell on it.  Just relieved as I'm sure you all are to be back online and with a reliable hoster backing us up.

  • Like! 1
Link to comment
2 hours ago, Dragon Brother said:

Welcome back fellas. I can imagine there's been a stressful few days in the background for you guys so hope you can now sit down with a beer or two to relax!

We're in glee it's all over, and even more that you guys are back with us here!

:hugs:

 

gogo

Link to comment
1 hour ago, Flix said:

Working in IT I can say there's no excuse for what happened.  The two commands that got swapped are nothing alike in form or function and there's no reason to run the one they did unless they wanted to wipe everything.  I can't even fathom how that happened.

But, not going to dwell on it.  Just relieved as I'm sure you all are to be back online and with a reliable hoster backing us up.

EVeryone has been mulling this over Flix... we're sure it's not quite they are saying it is...  a cover up? And we're SO happy with our new hosters.  20 years in the business, and very kind, always accessible and thoughtful.  

The forum is now faster!

:superman:

 

gogo

Link to comment

Oh wow, I think this expression has gotten stuck to my face from holding it for so long... 

800px-Edvard_Munch,_1893,_The_Scream,_oi

 

Woohoo!  Good to be back.  :dance2:

But now I'm looking at doing some updating to the forum and wiki software...  :oooo:

Link to comment

Thank you to gogo, Schot and everyone else that has a hand in bringing it back to life!

 

I would truly miss all the interesting characters on DarkMatters if it stopped... 

 

:heart:

  • Thanks! 1
Link to comment

I'm just really glad that everything worked out okay :)

It was such a huge relief to see that the forums and the Wiki were up and running again, and it must have been such a pain in the ass for you guys to have to deal with that…  Hopefully this has taken a lot of stress off your shoulders :sweating:

Link to comment
10 hours ago, bhj said:

Nice to see everything is working again :agreed:

 

:viking:

Hi Bhj!! Thanks for posting and the offer ... I remember all the seeming millions of hours u and Schot put into the beginning days of this forum. 

Your offer of help is that Heart of this forum. 

Hapoy u made your way back here home !!

😊😍😊

gogo

Link to comment
On 9/27/2018 at 5:35 AM, gogoblender said:

Internally on Mike's community pages and his twitter there is a lot of angry customers posting, and we were one of them.  We dont understand why when the servers went down, MDDHosting did not "ownup" to the problem and say that there is a HOSTING issue rather than this being a customer issue as is the case with a lot of gaming websites being lost because of inattention or money.   Schot has had huge success with Name Cheap.  Over the years with Mike we've always found that while he executes quite well, his tone isn't ours, and the family feeling that we've always felt was an integral part of being "darkmatters or wikkiites" wasn't there.  This always made us question decisions and second guess his sentiment, past apologies and actions.

When he almost charged us 400.00 American for hosting last year and then professed to making a mistake with no real apology that we could connect with we knew something was coming and this was it... a true distaster where allmost years of data from a company for many peoples' livelihoods and passions almost wiped out with a single stroke...

:blink:

We're in a much better place now. 

We've been researching pricing and responses times... we're actually running the sites on TWO hosts now...but, as We’re very sensitive to response time as Were almost always here  ( :oooo: )  and clicking and testing, noticed that the response from these new servers is significantly faster and that clicky feeling is good

 

We are so so sorry that this happened. 

When the sites went down, we started posting at FaceBook DarkMatters site and We made a new thread over at Steam (Thank you Androdion for picking it up! :hugs: ) . 

Thankfully we were able to connect to the community, and with Flix emailing me, we were able to keep in touch with a smaller community during these days of darkness.

Well, we're home folks, man, what a journey..

Exhausting, frustrating, and with a little bit of nostalgia and trepidation. 

Were leaving an old home but very very excited for the coming years.

 

We have the greatest of gratitude for all of you that have your found your way back to us.

Thank you

You're home!

:gogo:

 

gogo

 

Navigating this unexpected and potentially devastating storm, its turbulent waters threatening to long separate us, we now find ourselves reunited, our bonds of friendship, family, and community now much strengthened in the aftermath. We have also now gained another wonderful boon indeed, an "https" secure website. With our newly obtained security and our ever-abiding liberty, our pursuit of happiness is thereby well ensured.

  • Like! 1
Link to comment
23 hours ago, Delta! said:

Thank you to gogo, Schot and everyone else that has a hand in bringing it back to life!

 

I would truly miss all the interesting characters on DarkMatters if it stopped... 

 

:heart:

Yahoo ... our resident top chef specially Theuns... so happy you’re back our sweet tooth’s  were especially delighted 😀 

😄

gogo

 

 

 

 

  • Like! 1
Link to comment
22 minutes ago, Hooyaah said:

 

Navigating this unexpected and potentially devastating storm, its turbulent waters threatening to long separate us, we now find ourselves reunited, our bonds of friendship, family, and community now much strengthened in the aftermath. We have also now gained another wonderful boon indeed, an "https" secure website. With our newly obtained security and our ever-abiding liberty, our pursuit of happiness is thereby well ensured.

Hooyaah ... zombie just noticed that .. woo hooo ... yah together AND well defended!

☀️☀️☀️

gogo

Link to comment
18 hours ago, Excelsior said:

I'm just really glad that everything worked out okay :)

It was such a huge relief to see that the forums and the Wiki were up and running again, and it must have been such a pain in the ass for you guys to have to deal with that…  Hopefully this has taken a lot of stress off your shoulders :sweating:

Yah ... stress ... less! :4rofl:

This forum has always been strong because it’s meant to be ours and our family’s (you guys! 😃) getaway place ... it not being here is like a hole in our hearts. Now that it’s back up and we’re with a company who’s been in the net for more than 20 years were feeling better.

😎

gogo

 

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...
Please Sign In or Sign Up