I've been doing SQL administration for a long time. I increase performance through query optimization, hardware, configuration settings, data aggregation, storage types, and many other methods that i have learned through practical implementation and reading more books about performance optimization than women I've slept with.
I run into people often that believe InnoDB is faster ( its not ), that MyISAM is obsolete ( its not ), that their queries are fine ( they arent ), or that its the hardwares problem ( its not ).
I hear ridiculous things like "We need to use redis!".
Redis is a non relational data store that is only 'faster' because it stores everything in ram. What many developers ( and sysAdmins for that matter ) do not understand is that MySQL does the same thing via the operating systems disk cache or through MySQL layer buffer settings with InnoDB or key buffers with MyISAM. MariaDB has Cassandra for a key/value store if you really need relationless data store, but having it stored in memory is 50 layers of ridiculous unless you're ok with volatile data.
"Why arent we using hadoop!"
Hadoop is not a relational database, its also far slower than MySQL as a reporting layer. They are NOT competing platforms. Hadoop is used for long term original format ( or dim/fact ) storage of your data. You typically use Hadoop to aggregate your click streams into into a reporting layer or directly into your Analytics platform for unmolested searching. Hadoop is not a performance solution, it is a storage and aggregate solution.
"Why arent we using a netezza!"
Netezza is a relational database that uses FPGA's programmed by your sql queries. It sieves your data in its entirety through its SPU's for every query. Netezza is blazing fast for analytics, especially if you are used for your queries taking hours to run, they will finish in minutes on Netzza. It is however slow for reporting because it does not use indexes. You should never implement it for user/customer facing data, only for business analytics.
Dont even get me started on docStore databases.
Now for my rant, it pisses me off the I have to have this conversation with every single group of developers that I meet. They have absolutely no idea how their queries impact performance. I can not fix every query en mass because I have to reverse engineer their logic to figure out what they are trying to truly ascertain.
Typically the best result is through aggregating their data before hand, aka putting the data in tables as they would via their group by queries with the joined data already there.
for instance select blah from ( select blah from fagits join on shit.loot = fagits.blah group by idiots, noobs, dorks ) a join on a gooks.id = a.blah order by noty
I would fix this query by analyzing what it does. It requires two subqueries to run before it can finally order the data by noty. I would take the two subqueries and create a new table with that aggregate data, and during our ETL process aggregate it to the new table to maintain consistancy. Now when they run the query, they can simply do select blah from newTable order by noty, with an index on that table, the response would be instant, versus minutes before.
It blows my mind that people think the solution is more hardware, more cpu, more ram, faster disks without understanding that they are using MySQL to aggregate volumes of data for reporting which is like using a vacuum cleaner to mow the yard.
Anyway, I am off to buy 8 SSD's and to overclock a server because you can only scale a CPU horizontally so much and its been requested that I just do it.
I run into people often that believe InnoDB is faster ( its not ), that MyISAM is obsolete ( its not ), that their queries are fine ( they arent ), or that its the hardwares problem ( its not ).
I hear ridiculous things like "We need to use redis!".
Redis is a non relational data store that is only 'faster' because it stores everything in ram. What many developers ( and sysAdmins for that matter ) do not understand is that MySQL does the same thing via the operating systems disk cache or through MySQL layer buffer settings with InnoDB or key buffers with MyISAM. MariaDB has Cassandra for a key/value store if you really need relationless data store, but having it stored in memory is 50 layers of ridiculous unless you're ok with volatile data.
"Why arent we using hadoop!"
Hadoop is not a relational database, its also far slower than MySQL as a reporting layer. They are NOT competing platforms. Hadoop is used for long term original format ( or dim/fact ) storage of your data. You typically use Hadoop to aggregate your click streams into into a reporting layer or directly into your Analytics platform for unmolested searching. Hadoop is not a performance solution, it is a storage and aggregate solution.
"Why arent we using a netezza!"
Netezza is a relational database that uses FPGA's programmed by your sql queries. It sieves your data in its entirety through its SPU's for every query. Netezza is blazing fast for analytics, especially if you are used for your queries taking hours to run, they will finish in minutes on Netzza. It is however slow for reporting because it does not use indexes. You should never implement it for user/customer facing data, only for business analytics.
Dont even get me started on docStore databases.
Now for my rant, it pisses me off the I have to have this conversation with every single group of developers that I meet. They have absolutely no idea how their queries impact performance. I can not fix every query en mass because I have to reverse engineer their logic to figure out what they are trying to truly ascertain.
Typically the best result is through aggregating their data before hand, aka putting the data in tables as they would via their group by queries with the joined data already there.
for instance select blah from ( select blah from fagits join on shit.loot = fagits.blah group by idiots, noobs, dorks ) a join on a gooks.id = a.blah order by noty
I would fix this query by analyzing what it does. It requires two subqueries to run before it can finally order the data by noty. I would take the two subqueries and create a new table with that aggregate data, and during our ETL process aggregate it to the new table to maintain consistancy. Now when they run the query, they can simply do select blah from newTable order by noty, with an index on that table, the response would be instant, versus minutes before.
It blows my mind that people think the solution is more hardware, more cpu, more ram, faster disks without understanding that they are using MySQL to aggregate volumes of data for reporting which is like using a vacuum cleaner to mow the yard.
Anyway, I am off to buy 8 SSD's and to overclock a server because you can only scale a CPU horizontally so much and its been requested that I just do it.
Comment