Location: Home > blog > App news

What can we learn from these 5 DevOps 'horror stories'?

time:2022-08-29 17:39:11 source: browse:213

   DevOps is the key to business transformation. But to be clear, DevOps is a process, not a goal. In this sense, practising DevOps requires continuous optimisation, learning and trying better ways of doing things. Continuous improvement of DevOps practices is an important way to drive transformation in your organisation.
  Even some of the best-known companies have made mistakes and faced terrible consequences when applying DevOps to their transformation strategies. Fortunately, however, they have learned valuable lessons from their failures.

 SlideShare: Restricting Access to Production Environments

   SlideShare, the world's largest content-sharing community, launched a DevOps model early on to speed up the development process.
     Back in 2012, it was a small startup with less than 20 employees, development teams scattered across San Francisco and New Delhi, and a fairly complex infrastructure.
     At one point, a developer was trying to use a new tool to analyze a MySQL database. When he started changing the order of the columns in the database, what he didn't know was that it also changed the database in the production environment, eventually bringing down Slideshare.net and denying access to the more than 60,000 users trying to access it. Moreover, at the time of the accident, the person in charge did not realize that it was the tool that was performing the operation, and in the end, it took 15 minutes of collective effort to identify the root cause of the problem.
  Sylvain Kalache, then an operations engineer, commented that while DevOps is relevant to everyone on the licensing team, access to the production environment should be limited to the few people who can handle it. That's the importance of configuring advanced, role-based access (RBAC) for teams and individuals. tools like AWS IAM are critical to ensuring that everyone has access to perform their daily tasks, but not having full access can hurt the user experience. In this case, developers can access the staging environment to try the operation in order not to impact users.

 Knight Capital: The Dark Side of Automation

  Knight Capital paid a steep price for its DevOps failure. It was a global financial services company that had become the largest stock trader in the United States, with a 17.3% market share on the New York Stock Exchange, thanks to its high-frequency trading algorithms.
  Knight Capital used an internal application called SMARS to process buy orders in the stock market. This application had been running for many years and it had many outdated parts in its code base. One of these sections was a feature called Power Peg, which was inactive but had not been removed from the code base.
  In 2012, while writing new code for the application, the new code inadvertently called the Power Peg feature that Knight Capital had overlooked. As a result, Knight Capital's application filled billions of dollars worth of purchase orders in a matter of minutes. This resulted in the company paying $460 million in fines and going bankrupt overnight.
  The lesson: Automation is very powerful, but if used incorrectly, it can lead to major accidents. And, over time, processes need to be updated and new functionality needs to be introduced into the application to avoid conflicting changes.

 Workflowy: Don't split your database so easily

  Workflowy is a simple and elegant productivity tool and has been growing steadily. To cope with the user growth, the developers needed to optimize the architecture. As the database became huge, they decided that it would be a good idea to break up a large database into multiple smaller databases.
  In the process, they found that breaking down into multiple smaller databases slowed down queries and prevented users from accessing data. Some users were unable to sync data from their mobile devices, while others were unable to log into the web application.
  After some troubleshooting, the team discovered that the web server they were using (Apache) had a bug they had never encountered before, and after fixing the bug and trying to get everyone to log back in, the site shut down again.
  In the end, it was discovered that the process they used to break down the database was the root cause and kept the site up and running by avoiding some sort of "slow query" to the database. In addition, they upgraded their infrastructure to speed up queries.
  The lesson: Breaking down a database can lead to performance issues and even outages, but isolating critical issues and resolving them first allows for faster service recovery.
SoFlu Software Robot

 IRS: Moving to the Cloud

  The Internal Revenue Service (IRS) isn't known for its technological prowess. After all, its job is to collect taxes, not drive technological innovation.
  That's probably why the IRS application that processes tax returns failed in 2016. A voltage regulator on a computer server that processes millions of Americans' tax returns began to fail on Feb. 3. While a technician worked to fix the problem, the backup voltage regulator also failed.
  In fact, the IRS has been trying to update its main computer system for 40 years, but has failed. Since then, there have been news stories of "60-year-old IT system crashes on tax day due to new hardware" and "computer failure causes thousands of taxpayers to receive error messages.
  The lesson: a lot can go wrong when it comes to managing infrastructure. The solution is to move the infrastructure to the cloud. Today, the cloud is the most reliable way to run applications. Leave the setup and maintenance of physical hardware to the cloud provider so you can focus on what matters most -- running and improving applications.

  To conclude

  Whether it's databases, hardware infrastructure, legacy code, or cloud provider limitations, there are a myriad of issues that can pop up that you wouldn't expect when practicing DevOps. However, there are some lessons we can learn from the failures to try to avoid these problems. For example, if IRS had migrated its infrastructure to the cloud earlier, it wouldn't have had to spend tens of millions of dollars a year fixing all sorts of problems that shouldn't have occurred. Knight Capital wouldn't have gone out of business if it had used automation tools correctly.
  In fact, it's important to consider, during the design phase of a project, what tools or platforms can be used to better implement product functionality. If there is a platform that can shield infrastructure diversity, configure advanced access rights, automate linked testing, and provide intelligent alerts, it's SoFlu Software Robotics.
  SoFlu Software Robot is a fully automated software engineering platform launched by FeiCount that uses common technical function modules and supports components such as loops, conditional judgments, function calls, etc. Users can complete programming by simply configuring parameters via drag-and-drop according to business logic. Moreover, the platform unifies the code specification and does not rely on manual coding and code review, so it can ensure the high quality of code from the source. All in all, SoFlu Software Robot meets low-threshold development needs through visual programming, and automatic development can be achieved by entering flowcharts.
Data Security Recommended Download Software:whatsapp business apk