Tag Archives: coding

Techniques to Optimize MySQL: Indexes, Slow Queries, Configuration

How to Optimize MySQL Indexes Slow Queries Configuration_YWF

Many web developers see MySQL as the world’s most popular relational database. However, at some points, you will find many parts haven’t been optimized. However, instead of investigating it further, many people prefer to leave it at default values. As a solution, we have to combine the previous tips with new method that came out since, as presented bellows:

Configuration Optimization

One of the most important ways that every user of MySQL should do is to upgrade the configuration.5.7 which has better defaults than its previous version. If you use a Linux-based host, your configuration will be like /etc/mysql/my.cnf. Furthermore, your installation might load a secondary configuration file into that configuration file, as a result if the my.cnf file doesn’t contain much content, the other file /etc/mysql/mysql.conf.d/mysqld.cnf  might have.

Editing Configuration

It is important to feel comfortable when you are using the command line, before learning on how to edit configuration. For example, you can copy the file out into the main filesystem by copying it into the shared folder with cp /etc/mysql/my.cnf /home/vagrant/Code if you’re editing locally on a Vagrant box. Then, use a regular text editor to edit it and copy it back into place when done or else you can use a simple text editor, for instance vim by executing sudo vim /etc/mysql/my.cnf.

Manual Tweaks

To create this config file under the [mysqld] section, you should make the following manual tweaks out of the box.

innodb_buffer_pool_size = 1G # (adjust value here, 50%-70% of total RAM)

innodb_log_file_size = 256M

innodb_flush_log_at_trx_commit = 1 # may change to 2 or 0

innodb_flush_method = O_DIRECT

  • innodb_buffer_pool_size

The buffer pool is used to store caching data and indexes in memory. It can keep frequently accessed data in memory. Therefore, you can add this part of your app(s) the most RAM up to 70% of all RAM when you’re running a dedicated or virtual server where there is often a bottleneck in the DB.

  • Even though, you can find clear information about the log file size here, but the important point is about how much data to store in a log before removing it. A log in this case indicates checkpoint time because with MySQL, even though writes happen in the background, it still affects foreground performance. In fact, having big log files mean better performance because you create new and smaller checkpoints. However, it takes longer recovery time when there is a crash.
  • innodb_flush_log_at_trx_commit will explain what happens with the log file. Select 1 to get the safest setting since the log is flushed to disk after every transaction. Select 0 or 2 to get less ACID, but more performant. There is no big difference in this case to outweigh the stability benefits of the setting of 1.
  • innodb_flush_method to avoid double buffering, it will be set to Unless the I/O system is in very low performance, you should always perform this command.

Variable Inspector

Here are the steps to install the variable inspector on Ubuntu:

wget https://repo.percona.com/apt/percona-release_0.1-4.$(lsb_release -sc)_all.debsudo dpkg -i percona-release_0.1-4.$(lsb_release -sc)_all.debsudo apt-get updatesudo apt-get install percona-toolkit

 

You can also apply the instructions for other systems.

Then, run the toolkit with:

pt-variable-advisor h=localhost,u=homestead,p=secret

The output should not show these:

# WARN delay_key_write: MyISAM index blocks are never flushed until necessary. # NOTE max_binlog_size: The max_binlog_size is smaller than the default of 1GB. # NOTE sort_buffer_size-1: The sort_buffer_size variable should generally be left at its default unless an expert determines it is necessary to change it. # NOTE innodb_data_file_path: Auto-extending InnoDB files can consume a lot of disk space that is very difficult to reclaim later. # WARN log_bin: Binary logging is disabled, so point-in-time recovery and replication are not possible.

You don’t have to fix these as none of them are critical. Binary logging for replication and snapshot purposes is the only one we could add.

max_binlog_size = 1Glog_bin = /var/log/mysql/mysql-bin.logserver-id=master-01binlog-format = ‘ROW’

  • The max_binlog_size setting will determine how large binary logs will be. These logs will log your transactions and queries and make checkpoints. A log may be bigger than max if a transaction is bigger than max. Otherwise, MySQL will keep them at that limit.
  • With log_bin option, you can turn on the binary logging altogether. However, without it you can’t do snapshotting or replication. Note that this can be very strenuous on the disk space. Bear in mind that this can weigh the disk space. To activate binary logging, you will need a server ID, this will inform the logs which server they came from.

With its sane defaults, the new MySQL makes things nearly production ready. Every app is certainly different and has additional custom tweaks applicable.

MySQL Tuner

The main purpose of Tuner is to monitor a database in longer intervals and suggest changes based on what it’s seen in the logs.

You can simply download it to install it:

wget https://raw.githubusercontent.com/major/MySQLTuner-perl/master/mysqltuner.plchmod +x mysqltuner.pl

You also will be asked for admin username and password for the database when running it with ./mysqltuner.pl as well as running output information from the quick scan. You can see the example below.

[] InnoDB is enabled.[] InnoDB Thread Concurrency: 0[OK] InnoDB File per table is activated[OK] InnoDB buffer pool / data size: 1.0G/11.2M[!!] Ratio InnoDB log file size / InnoDB Buffer pool size (50 %): 256.0M * 2/1.0G should be equal 25%[!!] InnoDB buffer pool <= 1G and Innodb_buffer_pool_instances(!=1).[] Number of InnoDB Buffer Pool Chunk : 8 for 8 Buffer Pool Instance(s)[OK] Innodb_buffer_pool_size aligned with Innodb_buffer_pool_chunk_size & Innodb_buffer_pool_instances[OK] InnoDB Read buffer efficiency: 96.65% (19146 hits/ 19809 total)[!!] InnoDB Write Log efficiency: 83.88% (640 hits/ 763 total)[OK] InnoDB log waits: 0.00% (0 waits / 123 writes)

 

Keep in mind that this tool should be run once per week since the server has been running. You can also set up a cronjob to inform you the results periodically. So, make sure after every configuration change, you will restart the mysql server:

sudo service mysql restart

Indexes

The easiest way to understand MySQL indexes is from looking at the index which is in a book. When a book has any indexes, you won’t have to go through the whole book to search for a subject. Index helps you search something faster without having to go through the whole book. Therefore, MySQL indexes will help you speeding up your select queries. However, the index also has to be created and stored which cause the update and insert queries will be slower. Besides, it will cost you a bit more disk space. In general, you won’t notice the difference with updating and inserting if you have indexed your table correctly and therefore it’s advisable to add indexes at the right locations.

If the tables only contain a few rows, it doesn’t really get any benefits from indexing. Therefore, can we discover which indexes to add and which types of indexes exist?

Unique/Primary Indexes

Primary indexes are the main indexes of data, such as a user account, that might be a user ID, or a username, even a main email. Primary indexes are unique which indexes cannot be repeated in a set of data.

For example, you may experience when a user selected a specific username, nobody else can use it. Therefore as a solution, you can add a “unique” index to the username column. Furthermore, MySQL will notify if someone else tries to insert a raw which has an existed username.

ALTER TABLE `users` ADD UNIQUE INDEX `username` (`username`);

 

You can make both single column and multiple columns For example, you may need a unique index on both of those columns to make sure only you that own that username per country.

ALTER TABLE `users`ADD UNIQUE INDEX `usercountry` (`username`, `country`),

 

Regular Indexex

One of the most easiest to lookup indexes is regular indexes. This type is very useful, especially when you need to find data by specific column or combination of columns fast, without the need of data to be unique.

ALTER TABLE `users`ADD INDEX `usercountry` (`username`, `country`),

 

Fulltext Indexes

If you are looking for full-text searches, you can use FULLTEXT indexes. Several storage engines that support FULLTEXT indexes are only the InnoDB and MyISAM. While for TEXT columns are only CHAR and VARCHAR.

You will find these indexes are very useful especially for all the text searching. Keep in mind that finding words inside of bodies is FULLTEXT’s specialty. Therefore, you can use it on posts, comments, descriptions, reviews, etc.

Descending Indexes

Descending Indexes is an alteration from version 8+. When you have enormous tables to cultivate, you will find this index will come in handy. It works by sorting in descending order but came at a small performance penalty. It surely will speed things up.

CREATE TABLE t (  c1 INT, c2 INT,  INDEX idx1 (c1 ASC, c2 ASC),  INDEX idx2 (c1 ASC, c2 DESC),  INDEX idx3 (c1 DESC, c2 ASC),  INDEX idx4 (c1 DESC, c2 DESC));

 

Furthermore, when dealing with logs written in the database, posts and comments which are stored last to first and similar, you can consider applying DESC to an index.

Bottlenecks

This part will explain how to detect and monitor for bottlenecks in a database.

slow_query_log  = /var/log/mysql/mysql-slow.loglong_query_time = 1log-queries-not-using-indexes = 1

You can add the above command to the configuration; as a result it will monitor queries and those not using indexes. You can analyze it for index usage with the aforementioned pt-index-usage  tool, once this log has some data or you can also apply the pt-query-digest tool which the results will be like these:

pt-query-digest /var/log/mysql/mysql-slow.log # 360ms user time, 20ms system time, 24.66M rss, 92.02M vsz# Current date: Thu Feb 13 22:39:29 2014# Hostname: *# Files: mysql-slow.log# Overall: 8 total, 6 unique, 1.14 QPS, 0.00x concurrency ________________# Time range: 2014-02-13 22:23:52 to 22:23:59# Attribute          total     min     max     avg     95%  stddev  median# ============     ======= ======= ======= ======= ======= ======= =======# Exec time            3ms   267us   406us   343us   403us    39us   348us# Lock time          827us    88us   125us   103us   119us    12us    98us# Rows sent             36       1      15    4.50   14.52    4.18    3.89# Rows examine          87       4      30   10.88   28.75    7.37    7.70# Query size         2.15k     153     296  245.11  284.79   48.90  258.32# ==== ================== ============= ===== ====== ===== ===============# Profile# Rank Query ID           Response time Calls R/Call V/M   Item# ==== ================== ============= ===== ====== ===== ===============#    1 0x728E539F7617C14D  0.0011 41.0%     3 0.0004  0.00 SELECT blog_article#    2 0x1290EEE0B201F3FF  0.0003 12.8%     1 0.0003  0.00 SELECT portfolio_item#    3 0x31DE4535BDBFA465  0.0003 12.6%     1 0.0003  0.00 SELECT portfolio_item#    4 0xF14E15D0F47A5742  0.0003 12.1%     1 0.0003  0.00 SELECT portfolio_category#    5 0x8F848005A09C9588  0.0003 11.8%     1 0.0003  0.00 SELECT blog_category#    6 0x55F49C753CA2ED64  0.0003  9.7%     1 0.0003  0.00 SELECT blog_article# ==== ================== ============= ===== ====== ===== ===============# Query 1: 0 QPS, 0x concurrency, ID 0x728E539F7617C14D at byte 736 ______# Scores: V/M = 0.00# Time range: all events occurred at 2014-02-13 22:23:52# Attribute    pct   total     min     max     avg     95%  stddev  median# ============ === ======= ======= ======= ======= ======= ======= =======# Count         37       3# Exec time     40     1ms   352us   406us   375us   403us    22us   366us# Lock time     42   351us   103us   125us   117us   119us     9us   119us# Rows sent     25       9       1       4       3    3.89    1.37    3.89# Rows examine  24      21       5       8       7    7.70    1.29    7.70# Query size    47   1.02k     261     262  261.25  258.32       0  258.32# String:# Hosts        localhost# Users        *# Query_time distribution#   1us#  10us# 100us  #################################################################   1ms#  10ms# 100ms#    1s#  10s+# Tables#    SHOW TABLE STATUS LIKE ‘blog_article’\G#    SHOW CREATE TABLE `blog_article`\G# EXPLAIN /*!50100 PARTITIONS*/SELECT b0_.id AS id0, b0_.slug AS slug1, b0_.title AS title2, b0_.excerpt AS excerpt3, b0_.external_link AS external_link4, b0_.description AS description5, b0_.created AS created6, b0_.updated AS updated7 FROM blog_article b0_ ORDER BY b0_.created DESC LIMIT 10

 

You can also analyze these logs by hand, but you have to export the log into a more “analyzable” format which can be done like this:

mysqldumpslow /var/log/mysql/mysql-slow.log

To filter data and make sure only important things are exported, you can have additional parameters. For example: the top 10 queries sorted by average execution time.

mysqldumpslow -t 10 -s at /var/log/mysql/localhost-slow.log

Summary

The above techniques are given to make MySQL fly. So, when you have to deal with configuration optimization, indexes and bottlenecks, don’t hesitate to apply the above techniques.

Ways to Make your Apps Serverless

How-to-make-your-apps-serverless_ywf

The rise of a new buzzword has made many people think that servers no longer exist, but the fact is, a server is still needed somewhere. This is why the “serverless” term may mislead many people. What makes “serverless” term is that you can successfully build your applications without deploying code to your own servers. Therefore, as a web developer your dream of spending less time worrying about servers and more time for building software will come true.

Serverless in Action

When your site serves many readers a month, it means that the traffic that comes to our scale is significant and sudden, as articles can go viral at any moment. As a result, you may have trouble keeping up and our engineers are spending too much time on operations. Therefore, as a solution, you can take a look at serverless platforms which offer you a complete success of your projects, such as more maintainable, easier to operate, and cheaper.

Amazon Web Services

Serverless has a close relationship with Amazon Web Services (AWS). In fact, AWS is the answer for one critical question; where does the custom code go? The concept of using third-party services and platform is not new, with database, you can push notifications, caching, and many other layers of an application have all been available ‘as a service’ for a while, but they sat on the edge of your application. Therefore, a server is still needed as a place for core application code, which is usually a server responding to external requests. Through AWS Lambda and AWS API Gateway, you can deploy custom application code without the overhead of managing your own servers.

AWS Lambda

Applying Lambda is quite simple, you only need to write code and upload it. Lambda is Amazon’s version of functions-as-a-service (FaaS). Then, as a response to events including HTTP requests, S3 uploads, DynamoDB updates, Kinesis streams, and many others, AWS will run the code. Since scaling happens automatically, you are only charged when your functions are running.

None of these features are strictly a requirement for serverless, but AWS has certainly set the bar high. Any serverless platform will likely to have a stateless FaaS offering with very granular billing because of the precedent set by AWS.

Other Platforms

Right now, Amazon may still be the first competitor in the arena, but other providers are showing up quickly. All the major cloud platforms have recently launched services targeted at serverless applications. Here are a few of them:

  • Google Cloud Functions: Still in alpha, having almost the same functionality to AWS Lambda and can also be triggered by HTTP requests.
  • Azure Functions: This platform is still relatively new and similar to Lambda. Other benefits are Azure has a pleasant UI and makes it easy to expose functions over HTTP without needing a separate routing service.
  • IBM OpenWhisk: This is the only open source platform. You will want to investigate this, if you are interested in deploying your own serverless platform or just curious with how they work under the hood.

Challenges

If you think serverless is the solution of every problem, you might be wrong, for serverless does not come without its challenges. In fact, the community is still discovering best practices, especially when it comes to operations as the space is new and as such. In fact, this platform still requires tools for deploying, maintaining and monitoring our applications. However, many believe that there will be many new startups’ third party services targeted at solving these problems for serverless developers.

Tools

With lots of open source community, it is possible to manually build and deploy serverless applications yourself, but we suggest that you use the existed frameworks, since a few endpoints, building, packaging, zipping, uploading and versioning all become difficult to manage. Here are some frameworks that you might want to consider:

  • Serverless framework: This framework has a robust plugin system and integrates with many community developed plugins with many community developed plugins. Its stated goal is to eventually support deployment to any of the major cloud platforms.
  • Apex: Even though it is written in Go, it supports Python, Node.js and Java runtime languages. Furthermore, the inventor of this tool, TJ Holowaychuk, is a well-known fixture in the open source community and has a great sense of what makes for good developer tools.
  • Chalice: it is the only framework created and maintained by AWS and currently supports Python.
  • Shep: If you are looking for framework that can be used for all our production services, Bustles’ own open source framework can be a great choice. It focuses on the Node.js runtime and strives to be opinionated about how you should structure, build, and deploy applications.

 It seems that in 2017 “serverless” technology will keep growing and you will see rapid adoption from startups to fortune 500 companies. This is because many developers have realized that the serverless movement is the best way to build better software.

Json-api-normalizer: Why JSON API and Redux Work Best When Used Together

As a web developer, we have to manage the data needed for every application we work on. There are problems when doing so, such as:

  1. Fetch data from the back end.
  2. Store it somewhere locally in the front-end application.
  3. Retrieve the data from the local store and format it as needed by the specific view or screen.

In this article, we are going to discuss about the data usage from JSON, the JSON API and GraphQL back ends, and from that, we can learn the practical way on how to manage front-end application data. As for the real use, let’s imagine that we have carried out a survey that asks the same questions of many users. After each user has given their answers, other users can comment on them if wanted to. Our web app will perform a request to the back end, store the gathered data in the local store and render the content on the page. In order to make it stay simple, we will leave out the answer-creation flow.

Redux Best Practices

What makes Redux the best is that it is changeable no matter what kind of API you consume. It doesn’t matter whether you change your API from JSON to JSON API or even GraphQL and back during development, as long as you keep your data model, so it will not affect the implementation of your state management. Below is the explanation on the best practice using Redux:

  1. Keep Data Flat in the Redux Store

First, here’s the data model:

 

 

Based on the picture above, we have a question data object that might have many post objects. It is possible that each post might have many comment objects. Each post and comment has respectively one author.

Let’s say we have a back end that returns a specific JSON response. It is possible that it would have a carefully nested structure. If you store your data in the same way you do in the store, you will face many problems after that, like, for instance, you might store the same object many times like this:

{

  “text”: “My Post”,

  “author”: {

    “name”: “Yury”,

    “avatar”: “avatar1.png”

  },

  “comments”: [

    {

      “text”: “Awesome Comment”,

      “author”: {

            “name”: “Yury”,

        “avatar”: “avatar1.png”

      }

    }

  ]

}

In the example above, it indicates that we store the same Author object in several places, which is bad, because not only does it need more memory but it also has negative side effects. You would have to pass the whole state and update all instances of the same object especially if somebody changed the user’s avatar in the back end.

To prevent something like that from happening, we can store the data in a flattened structure. This way, each object would be stored only once and would be easily accessible.

{

  “post”: [{

    “id”: 1,

    “text”: “My Post”,

    “author”: { “id”: 1 },

    “comments”: [ { “id”: 1 } ]

  }],

  “comment”: [{

    “id”: 1,

    “text”: “Awesome Comment”

  }],

  “author”: [{

    “name”: “Yury”,

    “avatar”: “avatar1.png”,

    “id”: 1

  }]

 }

  1. Store Collections as Maps Whenever Possible

After we have the data in a good flat structure, we can gradually accumulate the received data, in order for us to reuse it as a cache, to improve performance or for offline use. However, if we combine new data in the existing storage, we have to select only relevant data objects for the specific view. We can store the structure of each JSON document separately to find out which data objects were provided in a specific request to gain this. There is a list of data object IDs that we can use to gather the data from the storage.

Let’s say there is a list of friends of two different users, Alice and Bob. We will then perform two requests to gather the list and review the contents of our storage consequently. Let’s suppose that from the start the storage is empty.

/ALICE/FRIENDS RESPONSE

Here’s the User data object with an ID of 1 and a name, Mike, like this:

{

  “data”: [{

    “type”: “User”,

    “id”: “1”,

    “attributes”: {

      “name”: “Mike”

    }

  }]

}

/BOB/FRIENDS RESPONSE

This is another request that would return a User with the ID of 2 and Kevin as the name:

{

  “data”: [{

    “type”: “User”,

    “id”: “2”,

    “attributes”: {

      “name”: “Kevin”

    }

  }]

}

STORAGE STATE

This is what our storage state would look like:

{

  “users”: [

    {

      “id”: “1”,

      “name”: “Mike”

    },

    {

        “id”: “2”,

        “name”: “Kevin”

    }

  ]

}

STORAGE STATE WITH META DATA

In order to find out or distinguish which data objects in storage are relevant, we have to keep the structure of the JSON API document. With that focus, we can change it into this:

{

  “users”: [

    {

      “id”: “1”,

      “name”: “Mike”

    },

    {

        “id”: “2”,

        “name”: “Kevin”

    }

  ],

  “meta”: {

      “/alice/friends”: [

        {

          “type”: “User”,

          “id”: “1”

        }

      ],

      “/bob/friends”: [

        {

          “type”: “User”,

          “id”: “2”

        }

      ]

  }

}

With this, we can now read the meta data and gather all mentioned data objects. Now here’s the recap of the operations’ complexities:

As can be seen from the picture above, maps certainly works better than arrays because all operations have O(1) as the complexity instead of O(n). If we use a map instead of an array for the User data object, it would be like this:

STORAGE STATE REVISED

{

  “users”: {

      “1”: {

        “name”: “Mike”

      },

      “2”: {

        “name”: “Kevin”

      }

  },

  “meta”: {

      “/alice/friends”: [

        {

          “type”: “User”,

          “id”: “1”

        }

      ],

      “/bob/friends”: [

        {

          “type”: “User”,

           “id”: “2”

        }

      ]

  }

}

Now with this simple method, we can find a specific user by ID almost instantly.

Processing the Data and JSON API

There are many solutions to convert JSON documents to a Redux-friendly form. However, while there is no significant change within the application’s lifecycle, it will cause a failure if things are too dynamic, even though normalizing the function with the provision of a JSON document works great if your data model is known in advance.

Using GraphQL might be possible and interesting as well; however, if our APIs are being consumed by many third parties, we can’t adopt it.

JSON API and Redux

Redux and the JSON API work best together. The data provided by the JSON API in a flat structure by definition conforms nicely with Redux best practices. The data is typified in order to be naturally saved in Redux’s storage in a map with the format type → map of objects.

There are things to consider, though. First, it should be noted that storing the types of data objects “data” and “included” as two separate entitles in the Redux store can violate Redux best practices, as the same data objects would be stored more than once.

To solve these problems, we can use the main features of json-api-normalizer, such as:

  • Merge data and included fields, normalizing the data.
  • Collections are converted into maps in a form of a id=> object.
  • The response’s original structure is stored in a special meta

First, in order to solve problems with redundant structures and circular dependencies, the introduction of the distinction of data and included data objects in the JSON API specification is needed. Second, there is a constant update on data in Redux, although gradually, that can help with the performance improvement.

 Now that you know why JSON API works best with Redux, it can be concluded that this approach can assist us in prototyping a lot faster and flexible with changes to the data model. If you are in doubt whether using Redux with JSON API or not, this article will help you find the solution and reason why you shouldn’t doubt this method.