07 Oct 2014
- Formatting is important
- A source code file should read like a newspaper article.
- Name should be simple and explanatory.
- Top parts of the file should provide high-level concepts.
- The lower == the more detailed information.
- Blank lines help in identification of separate concepts.
- Pieces of code which are related should be kept close in source code.
- Instance variables should live at the top of the class file.
- Local variables should be declared as close as possible to the place of usage.
- Code which is executed from a function, should be below that function.
- Lines should be at most 120 columns wide. (Uncle’s Bob personal preference)
- Preserve indentation of blocks.
Set team formatting rules and make everyone use them!
06 Oct 2014
- Better refactor code than comment it.
- Maintaining code takes time and requires great discipline - what usually is hard in fast-paced projects.
- Don’t rely too much on comments: they get outdated very easily. This makes them innacurate, not relevant, misleading or just simply lying.
- Instead of commenting out code - throw it away. We have a copy of it in code versioning system.
- “HTML in source code comments is an abomination”.
- Most comments are useless (too far from code, too obvious, just mumbling, not accurate enough).
When comments might be a good idea:
- Legal obligation to write specific comments (copyrights).
- Clarification of code which just can’t be written in more expressive way (quite rare).
- Docs on public API.
What I disagree with:
- TODO comments are bad - not good. Instead of TODO comment there should be a task in backlog, explaining what should be corrected.
05 Oct 2014
Main thoughts I want to remember from that chapter:
Remember: functions are not written correctly at first. Write something meeting functional requirements with TDD and then refactor mercilessly.
04 Oct 2014
It is sometimes beneficial to have something constant in database. RDBMs engines like Oracle or DB2 have tables like DUAL or SYSIBM.SYSDUMMY1. In hive there is no such thing by default … But why not create a custom one? The easiest way (which I think works on all (???) Linux boxes) is to create one based on /etc/hostname
.
CREATE OR REPLACE TABLE dual (dummy STRING);
LOAD DATA LOCAL INPATH '/etc/hostname' OVERWRITE INTO TABLE dual;
INSERT OVERWRITE TABLE dual
SELECT
"X" AS dummy
FROM
dual
LIMIT 1;
Hacky… But on the other hand pretty simple.
The other way to do it is … write a custom UDTF. Check how it can be done with this project: github
Oracle docs on DUAL
: Oracle docs
03 Oct 2014
Human being learns all the time … I’ve just found out that there is a wonderfully simple way of writing equal conditions including NULL values in Hive: the <=> operator.
I used to write something like this:
...
WHERE
column1 = column2 OR (column1 IN NULL AND column2 IS NULL)
Now it boils down to:
WHERE
column1 <=> column2
For reference: NULL = NULL is false by definition in SQL