This is probably well known issue for everyone having some MySQL experience or experience with any other SQL database. Still I see this problem in many production applications so it is worth to mention it, especially as it is connected to MySQL Performance. No it might not affect MySQL Performance per say but it limits our ability tune MySQL Performance as queries become fragile – changing execution plan leads to different query results.

So what I’m speaking about ?

Lets say you have query something like SELECT A,B,MAX(C) FROM TBL GROUP BY A – what would you expect from such query ? Column A is part of group by so its value is same for whole group. MAX(C) is also particular value for each group, while B is not part of GROUP BY and may well correspond to different values. Which one are you looking to get ? In fact this is where results becomes non-deterministic and fragile – any B from the group could be returned, while you might be expecting some particular one.

Many other DBMS and ANSI SQL Simply forbids such queries, MySQL is more permissive and will return you first B it runs into. This value however can be dependent on selected execution plan and change if you add some indexes. Also plan may change as database statistics changes and your query may become broken without any reason.

My Advice is to stay away from such group by statements, however as usually there are exceptions and such GROUP BY statements may be faster than alternatives. Assume for example we have non-normalized table (or join result) where two columns have 1-1 relationship. For example we have user id and login which are both unique. In this case running

SELECT id,login,max(login_time) FROM log GROUP BY id,login May be replaced by SELECT id,login,max(login_time) FROM log GROUP BY id which may be faster especially if GROUP BY is executed via sorting.

In this case even though we do not know which value from the group will be returned we do not care as it will be same for all rows.

9 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Lukas

I recommend people enable the sql mode to prevent these kinds of queries:

#

ONLY_FULL_GROUP_BY

Do not allow queries for which the SELECT list refers to non-aggregated columns that are not named in the GROUP BY clause. The following query is invalid with this mode enabled because address is not named in the GROUP BY clause:

SELECT name, address, MAX(age) FROM t GROUP BY name;

As of MySQL 5.0.23, this mode also restricts references to non-aggregated columns in the HAVING clause that are not named in the GROUP BY clause.

Sheeri

especially if GROUP BY is executed via sorting.

Um, in your example, you say that id and login are unique — I’m assuming that’s enforced with UNIQUE KEY, right? The internals manual states that GROUP BY uses sorting when there is no index…..so while you are correct in the general case, that GROUP BY may be faster if it does not have to sort, your example does not fit your point.

The query in your example is a bad one, but not because of the optimizer’s behavior. The query is poor (and I’m sure it’s straight from clients who need your help!) because you do not NEED to GROUP BY 2 elements if they have a 1:1 relationship. Folks need to pick one element in that case. I’m sure other DBMS’ don’t handle this case “right” either.

This can be summed up with a “Best practice” — “Do not GROUP BY more than you need to.”

see
http://dev.mysql.com/doc/internals/en/optimizer.html
under “GROUP BY”

cemozdemir

CREATE TABLE tblmakale (makaleid tinyint(3) unsigned NOT NULL default ‘0’, uid tinyint(3) unsigned default ‘0’, makaleicerik varchar(255) default NULL, puan int(3) unsigned default NULL, PRIMARY KEY (makaleid), KEY NewIndex (puan,makaleid,makaleicerik,uid)) ENGINE=MyISAM DEFAULT CHARSET=latin1;

INSERT INTO tblmakale VALUES(“3”, “2”, “gjgh”, “0”);
INSERT INTO tblmakale VALUES(“12”, “5”, “545645645”, “1”);
INSERT INTO tblmakale VALUES(“2”, “3”, “ghjghjghjghj”, “2”);
INSERT INTO tblmakale VALUES(“4”, “1”, “gg”, “3”);
INSERT INTO tblmakale VALUES(“9”, “7”, “thgjnm”, “4”);
INSERT INTO tblmakale VALUES(“0”, “2”, “fdsd”, “5”);
INSERT INTO tblmakale VALUES(“8”, “6”, “rtret”, “5”);
INSERT INTO tblmakale VALUES(“11”, “6”, “ghnb”, “6”);
INSERT INTO tblmakale VALUES(“10”, “5”, “ewrt65”, “7”);
INSERT INTO tblmakale VALUES(“1”, “5”, “fgjhgj”, “10”);
INSERT INTO tblmakale VALUES(“5”, “0”, “g”, “10”);
INSERT INTO tblmakale VALUES(“13”, “3”, “12323423”, “11”);
INSERT INTO tblmakale VALUES(“14”, “2”, “reggfjmnljçpı”, “12”);
INSERT INTO tblmakale VALUES(“16”, “0”, “iiiiiiiiiiiiii”, “18”);
INSERT INTO tblmakale VALUES(“6”, “0”, “hhh”, “20”);
INSERT INTO tblmakale VALUES(“15”, “2”, “ÅŸliÅŸlilÅŸilÅŸi”, “21”);
INSERT INTO tblmakale VALUES(“7”, “3”, “jkll”, “22”);

select a.uid, a.makaleicerik, a.puan, a.makaleid from tblmakale a
inner join tblmakale b on a.uid = b.uid group by a.uid, a.makaleicerik, a.puan having a.puan = max(b.puan)

jawaharlal

select sum of marks for all six columns and display the count more than one row.
my query is like following

select max(marks) from xyz where class=’abc’ group by sname;

marcus

SELECT name, address, MAX(age) FROM t GROUP BY name;

isnt this sql statement wrong? if you are grouping by name, you get a subset of address and age. on age you are applying an aggregate operator (MAX), but which item of the subset is seleected from address?

if table name, address, age contains

meier, street a, 23
meier, street b, 26

the group-by subset should be meier, (street a, street b), (23, 26)

max means

meier, (???), 26

so i’d expect sorta subset instead of questionmark. which aggregate operator is applied here? and, isnt it fatal for development to select apples and max(price) while grouping by peaches?

regards,
marcus

PesAfTeftReem

Bloomberg Drug News anxiety medication buspar BuSpar is available as tablets for oral administration containing 5 mg, 10 mg, 15 mg, or 30 mg of buspirone hydrochloride. http://www.kindercareschool.com/ – buy buspar online

Abduraoof

Using group by we get error
field is not in list
eg : (select name form user group by name)
name is not in list

Abduraoof

Using group by we get error
field is not in list
eg : (select name from user group by name)
name is not in list