Thursday, May 28, 2020

Java Multithreading


Why we need Threads?

  1.  Responsiveness - can be achieved with Concurrency (Multitasking)
  2.  Performance      - can be achieved with Parallelism



Context Switching
  •  Context switching is expensive 
  •  Context switching between threads is a lot cheaper than context switching between processes
  •  Too many threads - OS spending more time in management than real productive work
  •  Thread consuming less resources than processes.

Thread scheduling 
  • There are different possible of ways to schedule
    • First Come First Serve - problem with that if long threads come first other thread will be unresponsiveness, it is called starvation  
    • Short Job First - this time longest job will wait
    • Epochs -  OS divides CPU time to moderately sized pieces called Epochs.  OS allocates different time for each thread in each Epoch. It is done according to Dynamic Priority calculations. 
Thread creation & it's methods
  • Two way of creating threads
    • Implement Runnable interface provide in construction of Thread object
    • Extend Thread object
  • Number of threads should be equal to number of cores in machines
  • Use thread.setUncaughtExceptionHandler  to catch unchecked exceptions during run-time.
    You can either clean up resources or log the issue for trouble shooting purposes
  • Stopping thread from another thread has two ways
    • Thread.interrupt() - you can interrupt the thread in two scenarios
      1. If the thread is executing a method that throws an InterruptedException
      2. If the thread code is handing the interrupt signal explicitly
    • Daemon threads - background threads that do not prevent the application from exiting if the main thread terminates. Other reason , code in a worked thread is not under our control, and we do not want it to block our application from terminating
  • By default, at least if one thread is running application will not stop even main thread stopped. So we need to stop all threads gracefully
  • Thread.join() 
    • calling the join() method has a synchronization effect. join() creates a happens-before relationship
    • Happens-before :  This means that when a thread t1 calls t2.join(), then all changes done by t2 are visible in t1 on return. However, if we do not invoke join() or use other synchronization mechanisms, we do not have any guarantee that changes in the other thread will be visible to the current thread even if the other thread has completed.
    • When we invoke the join() method on a thread, the calling thread goes into a waiting state. It remains in a waiting state until the referenced thread terminates.
    • Timed join() is dependent on the OS for timing. So, we cannot assume that join() will wait exactly as long as specified.
  • In order to avoid creation/destroy of threads there is thread pooling mechanisms.

Data Sharing between Threads
  • Thread local variables are stored in stack . Like local variable and local object references
  • Shared information stored in Heap. Like Objects, class members and static variables
  • Critical section guarded with synchronized keyword. Two ways of doing this
    • synchronized on method level - Monitor

    • synchronized inside method with explicit object - lock


    • Re-entrant - thread in synchronized method/section can access to other synchronized method/section  



Atomic Operations
  • Object reference assignment - including getter, setter for exmaple
  • Primitive type assignments except long and double. Because long and double 64 bit long 
  • We can define long and double volatile.  With volatile they are guaranteed in single HW operation
  • Knowledge of atomic operations is key to us create high performance applications 

Concurrency problems
  • Race condition : two threads working on same shared object. One of them modifying the object , due to OS scheduling  it may cause incorrect results. Core of the problem is non-atomic operation performed on shared object . Solution - identifying  the critical section where race condition happened and protecting with  synchronized block.
    https://stackoverflow.com/questions/34510/what-is-a-race-condition
  • Data race : solution, establish happens-before semantics by one of these methods
    • synchronization of method
    • using volatile. No compiler re-ordering will happen. whatever code before and after volatile will run as is.
Locking Strategies
  • Coarse-grained strategy : lock whole object. Might impact the performance
  • Fine-grained strategy  : lock party of shared objects using lock object

Deadlock 
  • Condition to leads to deadlock
    • Mutual exclusion
    • Hold and wait
    • Non-preemptive allocation 
    • Circular wait
  • Solution to deadlock is avoid one the conditions mentioned above
    • Avoid circular wait - this one easiest one. 
  • Deadlock detection
    • Watchdog
    • Thread interruption 
    • tryLock operation
Reentrant Lock
  •  Similar locking with synchronized locking but provides more control over lock with advanced operations
  •  Pattern to use it
    class SharedData{
       private Lock lockObject = new ReenterantLock();

       public void method(){
            lockObject.lock();
            try{
                 userSharedObject();
           }finally(){
                lockObject.unlock();
            } 
       }
    }
  • In order to avoid starvation - one thread is continuously using shared object but other are waiting - you can set true into constructor of  ReenterantLock(true) object. which is fairness flag. But this one comes with cost. Use only when you really need it.
  • ReenterantLock.lockInterrupility()
  • ReenterantLock.tryLock()
  • ReenterantReadWriteLock - if our shared object is read intensive we can use it otherwise it can perform worse then traditional locks. Example of using read-write lock is caching where system is read intensive. Multiple read threads can access the shared object and lock it, we can see number of concurrent read threads. Only one write thread can lock the shared object no other write/read threads can access during write lock. 
Semaphore
  • Can restrict number of threads accessing to shared data.
  • similar to lock but different in many ways. 
  • One use case if Producer-Consumer using semaphore. Producer-consumer pattern used in web sockets, video streaming, Actor models
Condition variable


Other methods
  • wait
  • notify() and notifyAll()
Lock free programming
  • AtomicInteger, AtomicLong...
  • AtomicReferences




Tuesday, May 26, 2020

Java Exceptions



  • All RuntimeExceptions are unchecked exceptions rest of them are checked exceptions
  • Always use try-with-resource. In order to use objects  should implement AutoClosable
  • Exceptions are very slowly. Code running inside try-catch is performing slowly. If you have chance just use simple test (like if(!s.empty) s.pop() ) rather then guarded section
  • Throw early, catch late






Sources :




Java hashCode()

  • Objects that are equal must have the same hash code within a running process
  • Whenever you implement equals, you MUST also implement hashCode
  • Whenever two different objects have the same hash code, we call this a collision.
  • A collision is nothing critical, it just means that there is more than one object in a single
    bucket, so a HashMap lookup has to look again to find the right object. A lot of collisions will degrade the performance of a system, but they won’t lead to incorrect results.
  • It is good to generate same hash code in different execution of programs but you should not relay on this. String and Integer are generating same hash code always will be same.But while most of the hashCode implementations provide stable values, you must not rely on it.here are Java libraries that actually return different hashCode values in different processes and this tends to confuse people. Google’s Protocol Buffers is an example.
  • Do not use hashCode in distributed applications
  • You may know that cryptographic hash codes such as SHA1 are sometimes used to identify objects (Git does this, for example). Is this also unsafe? No. SHA1 uses 160-bit keys, which makes collisions virtually impossible. Even with a gigantic number of objects, the odds of a collision in this space are far below the odds of a meteor crashing the computer that runs your program. This article has a great overview of collision probabilities.
  • A cryptographic hash such as MD5 or SHA-1 would be ok for many cases, but might be a bit heavyweight if you’re dealing with a really high-throughput service.

Sources


How does HashMap works in Java

  • Array created with default capacity of 16
  • Then getting hash code of the key
  • It rehashes the hash code to prevent against a bad hashing function from the key that would put all data in the same index (bucket) of the inner array
  • It takes the rehashed hash hashcode and bit-masks it with the length (minus 1) of the array. This operation assures that the index can’t be greater than the size of the array. You can see it as a very computationally optimized modulo function.

  • Finding appreciate array index according to hash code and saving in bucket associated with this index
  • In Java 8 , if bucket size more than 8 automatically converting that bucket from linked list to read black tree
  • Can auto size the map according to load factor. Initial arrays size is 16 and load factor is 0.75
  • HashMap is not thread safe but HashTable is thread safe but locks whole data structure during concurrent access.  On the other hand , ConcurrentHashMap is locking only bucket
  • Mostly Integer and String used as map key because they immutable and provide string hash code function
  • If you have too many data to put on Map , it is advisable to create map with approximate high initial capacity. Because there is additional overhead of shrinking the map


Youtube : link




Java Collections





Iterable Interface

The Iterable interface is the root interface for all the collection classes because the Collection interface extends the Iterable interface, therefore, all the subclasses of Collection interface also implement the Iterable interface.
The iterable interface contains only one abstract method.
  • Iterator iterator(): It returns the iterator over the elements of type T.

Iterator Interface

The iterator interface provides the facility of iterating the elements in a forward direction only.

public interface Iterator<E>{
        E next();     
        boolean hasNext();
        void  remove();
        default void forEachRemaining(Consumer<? super E> action);
}

Collection Interface

The Collection interface builds the foundation for the Collection framework. The collection interface is one of the interfaces which is implemented by all the Collection framework classes. It provides common methods to implement by all the subclasses of collection interfaces.
public interface Collection<E>{

        boolean add(E element)
        Iterator<E>  iterator()  
       int size()boolean isEmpty()
       boolean contains(Object obj)
       boolean containsAll(Collection<?> c)
       boolean equals(Object other)
boolean addAll(Collection<? extends E> from)
       boolean remove(Object obj)
       boolean removeAll(Collection<?> c)
       void clear()
boolean retainAll(Collection<?> c)
       Object[] toArray()
       <T> T[] toArray(T[] arrayToFill)   
        ...............
}


Concrete Collections


List Interface

ArrayList and LinkedList implements this interface.  get and set methods can be works  different  in 

terms of performance due to nature of array and list data structure. Java language designer added

RandomAccess tagging interface in order to distinguish between these two

public interface List<E>{
        void add(int index, E element)
void remove(int index)
E get(int index)
E set(int index, E element)
}



Set Interface

* Usually implemented by HashSet and TreeSet classes
* TreeSet visits elements in sorted order
* In HashSet if someone providing poor hashing algorithm then it can be slower. On the other hand TreeSet performance guaranteed. But you have to provide Compactor or implement compareTo method  

Queue  Interface

* Queue let you efficiently add at the tail and remove from head
* Deque can add/remove on both ends
* Priority Queue  isn't queue 
    -  doesn't remember in which order elements were added
    -  when removed , highest priority elements were removed
    -  useful for work scheduling 




Concurrent Modification
Suppose one iterator traverses a collection and  another modifies the collection by add/removing the

element. in the case of linked list , that won't work - the links will not be consistent. Linked list

detects the concurrent modification and throws ConcurrentModificationException

In order to understand to which collection have modification count you need to check the java API

documentation.  This is also sometimes called fail-fast

Reference - 1

Reference - 2 


Maps

* HashMap hashes the keys, TreeMap organizes them in sorted order
* map.get(id) can return null if not exists. Then you need to check the value. In order to avoid you can use map.getOrElse(id, $value) if key absent returns $value

* Easiest way to iterate over map : map.forEach ( (k,v)-> doSomething ) 

* Updating map entries
  • map.put(word, map.get(word) +1 )
  • If key is not present then you can use map.put(word, map.getOrDefault(word,0 ) + 1 )
  • map.putIfAbsent(word, 0) then map.put(word, map.get(word) +1)
  • map.merge(word, 1 , Integer::sum)  If word wasn't present, put 1 . Otherwise , put them sum of 1 and previous value
  • Efficient map.forEach( (k,v) -> do something with k,v )
* LinkedHashMap traverses the entries in the other which they were added

Views

* A view implements a collection interface without storing the elements. Examples :

  Collection<String> greetings = Collections.nCopies(100,"Hello");  // create illusion of 100 hellos

  Collection<String> greetings = Collections.singletion("Helllo");

  Collection<String> greetings = Collections.emptySet();

  List<Employee> list = staff.subList(10,20);

Restricted Views

Collections.unmodifiableCollection
Collections.unmodifiableList
Collections.unmodifiableSet
Collections.unmodifiableSortedSet
Collections.unmodifiableNaviagableSet
Collections.unmodifiableMap

* look but don't touch
* Synchronized views for safe concurrent access. But you should use  a thread safe collection instead.


  







Practical

List<String> names = Arrays.asList("A", "B", "C");

In Java - 7
List<Integer> digits = [1,2,3,4,5,6];   
Set<Integer> digits = {1,2,3,4,5,6};

In Java -9
List<Integer> digits = List.of(1,2,3,4,5,6);   

Set<Integer> digits =  Set.of(1,2,3,4,5,6);


Map<Integer, String> map  = {4 : "ab", 5 : "bc", 6 : "ce"};
In Java -9
Version 1  : 
Map<Integer, String> map  = Map.of(4 , "ab", 5 , "bc", 6 , "ce");
OR
Version -2 : 
import static java.util.Map.*
map = ofEnteries( entry(4,"a") ,   entry(5,"b") , entry(6,"d") )
* Version 1 works only if you have less than 10 elements

Collection to Arrays
String[] names = collection.toArray( new String[collection.size()]);

References 

Book :  Core Java 11 Fundamentals, Second Edition by Cay S. Horstmann




Sunday, May 24, 2020

Unit vs Integration Testing

* Developers should run unit tests and then commit the code. (best practice)

* One of the golden rules of unit testing is that your tests should cover code with “business logic”.

In this case, the highlighted part in gold is where you should focus your testing efforts. This is the part of the code where usually most bugs manifest. It is also the part that changes a lot as user requirements change since it is specific to your application.

* So what happens if you get across a legacy application with no unit tests? What if the “business logic” part ends up being thousands of lines of code? Where do you start?
In this case you should prioritize things a bit and just write tests for the following:
1. Core code that is accessed by a lot of other modules
2. Code that seems to gather a lot of bugs
3. Code that changes by multiple different developers (often to accommodate new requirements)
How much of the code in these areas should we test, you might ask. Well, now that we know which areas to focus on, we can now start to analyze just how much testing we need to feel confident about our code.




Reference :
https://zeroturnaround.com/author/kostis-kapelonis/


Run MySQL as docker container

1. docker pull mysql

2. docker run -p 3306:3306 --name mysqlimage -e MYSQL_ROOT_PASSWORD=abc123  -d  mysql

In order to connect from MySQL workbench type one of the below IP
- localhost
- docker inspect CONTAINER_ID | grep "IPAddress"

Code review


Code Review

Best Article : https://www.processimpact.com/articles/humanizing_reviews.pdf

Detailed  : https://medium.com/palantir/code-review-best-practices-19e02780015f

Stats : https://blog.codinghorror.com/code-reviews-just-do-it/?source=post_page

Book : https://www.amazon.com/exec/obidos/ASIN/0201734850/codihorr-20

Best Practices : https://github.com/palantir/gradle-baseline/tree/develop/docs