(Deprecated) Lisa's Tech Blog: January 2016

Saturday, January 30, 2016

Adaptation of Adversaries [1]

The adversaries are motivated to transform the test data to reduce the learner's effectiveness.
Spam filter designers

Attempt to learn good filters by training their algorithms on Spam (and legitimate) email messages received in the recent past.

Spammer

Are motivated to reverse-engineer existing Spam filters and use this knowledge to generate messages which are different enough from the (inferred) training data to circumvent the filters.

Solutions

Increase the robustness of the learning algorithm to generic training/test data differences via standard methods such as regularization or minimization of worst-case loss [1]

However, these techniques do not account for the adversarial nature of the training/test set discrepancies and may be overly conservative.

Predictive analystics to anticipate and counter the adversaries [1]

For example, predictions can be made using extrapolation or game-theoretic considerations, and can be employed to transform training instances so that they become similar to (future) test data and therefore provider a more appropriate basis for learning.

Time-varying posture to increase uncertainty [1]

Pros

This approach is flexible, scalable, easy to implement, and hard to reverse-engineer.

Reference

[1] Moving Target Defense for Adaptive Adversaries, by Richard Colbaugh and Kristin Glass, in ISI 2013.

Monday, January 25, 2016

High-speed Covert channel Attacks in the Cloud

The challenges in conducting covert channel

scheduling uncertainty
address uncertainty
cache physical limitations

Different VMs may not share cache
This can be overcomed by the atomic operations implementation in the system

i.e., the memory bus will be locked when a cache is being locked.

Memory-bus based covert channel attacks

Friday, January 22, 2016

Procedure

For example, "rm file.txt"

It would search for a file called rm, load the file into memory, and execute it with the parameter file.txt.

Mode Bit

A bit, called mode bit, is added to the hardware of the computer to indicate current mode.

0: kernel mode

When a task is executed on behalf of the operating system

1: user mode

When a task is executed on behalf of the user

Switch

Whenever a trap or interrupt occurs, the hardware switches from user mode to kernel mode

i.e., change mode bit to 0

When a user application requests a service from the operating system (via a system call)

It must transition from user to kernel mode to fulfill the request

Privileged Instructions

The hardware allow privileged instructions to be executed only in kernel mode.
The instruction to switch to kernel mode is an example of a privileged instruction. Some other example include I/O control, timer management, and interrupt management.

Escalate Privileges

For example, on Unix, the setuid attribute on a program cause that program to run with the user ID of the owner of the file, rather than the current user's ID.

Advantage

It allows the execution of a process that is not completely in memory. So that it enables users to run programs that are larger than actual physical memory.
In addition, it abstracts main memory into a larger, uniform array for storage, separating logical memory as viewed by the user from physical memory. This arrangement free programmers from concern over memory-storage limitations.

Thursday, January 21, 2016

Device Controller

A device controller is in charge of the devices.
It maintains

some local buffer
and a set of special purpose registers

It is responsible for moving the data between the peripheral devices that it is controls and its local buffer storage.

Device Driver

The operating system has a device driver for each device controller
This device driver understands the device controller and presents a uniform interface to the device to the rest of the operating system

Interrupt Driver IO

Procedure

The device driver loads the appropriate registers within the device controller.
The device controller, in turn, examines the contents of these registers to determine what action to take.
The controller starts the transfer of the data from the device to its local buffer.
Once the transfer of data is complete, the device controller informs the device driver via an interrupt that it has finished its operation.
The device driver then returns control to the operating system, possibly returning the data or a pointer to the data if the operation was a read.

Drawback

High overhead when used for bulk data movement such as disk I/O.
To solve this problem, directed memory access (DMA) is used.

Directed Memory Access (DMA)

After setting up buffer, pointers and counters for the I/O device, the device controller transfers an entire block of data directly to or from its own buffer storage to the memory, with no intervention by the CPU.

Introduction

The occurrence of an event is usually signaled by an interrupt from either the hardware or the software.

Hardware

Hardware may trigger an interrupt at any time by sending signal to the CPU, usually by way of the system bus.

Software (called Trap)

Software may trigger an interrupt by executing a special operation called a system call.
The Trap could be, e.g., division by zero, or invalid memory access

Return Address

Before the interrupt

Before enter the interrupt, the return address will be stored on the system stack.

After the interrupt

After the interrupt is serviced, the saved return address is loaded into the program counter, and the interrupted computation resumes s through the interrupt had not occurred.

When a computer start running, i.e., when it is powered up or rebooted. it needs to have an initial program to run.

This initial program, or bootstrap program, tends to be simple. Typically, it is stored in read-only memory (ROM) or eletrically erasable programmable read-only memory (EEPROM), known by the general term firmware, within the computer hardware.

Computer Storage Unit

Bit

A bit is the basic unit of computer storage. It can contain one of two values, zero or one.

Byte

A byte is 8 bits, and on most computers it is the smallest convenient chunk of storage.

For example, most computers don't have an instruction to move a bit but do have one to move a byte

Word

A word is generally made up of one or more bytes.

For example, a computer may have instructions to move 64-bit (8-byte) words.

KiloByte

1024 Bytes

MegaByte

(1024)^2 Bytes

Usually round off as 1 million bytes

GigaByte

(1024)^3 Bytes

Usually round off as 1 billion bytes

Wednesday, January 20, 2016

Session

Definition

A session is a semi-permanent interactive information interchange between two or more communication devices.

Implementation

The session is implemented by means of a database with information about the state of each session.

The reason of not implementing it with multiple processes or threads is due to large resource overhead.

Cluster of Servers

When a client may connect to any server in a cluster or servers, it is important to maintain consistency.

The client must be either directed to the same server for the duration of the session, or the servers must transmit server-side session information via a shared file system or database.

Client Side Web Sessions

Client-side sessions use cookies and cryptographic techniques to maintain state without storing as much data on the server.

When presenting a dynamic web page, the server sends the current state data to the client (web browser) in the form of cookie. The client saves the cookie in memory or on disk.

With each successive request, the client sends the cookie back to the server, and the server users the data to "remember" the state of the application for that client and generate an appropriate response.

Issues

Browser limits the number and size of cookies that may be stored by a web site.

HTTP Session Token

A session token is a unique identifier that is generated and sent from a server to a client to identify the current interaction session. The client usually stores and sends the token as an HTTP cookie and/or sends it as a parameter in GET or POST queries.

The reason of using session tokens is that

The client only has to handle the identifier
All session data is stored on the server (usually in a database).

Completely Fair Scheduler

Introduction

CFP uses a concept called "sleeper fairness",

which considers sleeping or waiting tasks equivalent to those on the runqueue. This means that interactive tasks which spend most of their time
So that the interactive tasks which spend most the their time waiting for user input or other events get a comparable share of CPU time when they need it.

Algorithm

The data structure used for the scheduling algorithm is a red-black tree in which the nodes are scheduler-specific structures, entitled "sched_entity".

These are derived from the general task_struct process descriptor, with added scheduler elements. These nodes are indexed by processor execution time in nanoseconds.
Idea Processor

an "idea processor" would equally share processing power among all processes.

Maximum execution time

Thus the maximum execution time is the time the process has been waiting to run, divided by the total number of processes, or in the other words, the maximum execution time is the time the process would have expected to run on an "ideal processor"
A maximum execution is also calculated for each process. This time is based upon the idea that an "idea processor" would equally share processing power among all processes.

When the scheduler is invoked to run a new process, the operation o the scheduler is as follows

The left most node of the scheduling tree is chosen (as it will have the lowest spent execution time), and sent for execution
If the process simply completes execution, it is removed from the system and scheduling tree
If the process reaches its maximum execution time or is otherwise stopped (voluntarily or via interrupt), it is reinserted into the scheduling tree based on its new spent executiom time.
The new left-most node will then be selected from the tree, repeating the interation.

Reference

[1] https://en.wikipedia.org/wiki/Completely_Fair_Scheduler

CPU Components

Components in a CPU

General purpose Registers

A synonym for memory in computer science

General purpose reigster

A memory cell

Each general purpose register has a unique name
It is used to

store (and recall) intermediate result of complex computations

Arithmetic and Logic Unit (ALU)

It is a complex electrical circuit that can perform Mathematical (+,-,x,/) and logical operation (<, <=, >, >=, and, or)
The output (result) of the computation (obtained by the ALU) is often stored in a general purpose register.

Instruction Register (IR)

Contains the current instruction being executed by the CPU
The CPU will perform the operation indicated by the instruction code contained in the instruction register

Program Counter (PC)

The program counter is a register (memory cell)!
This register contains the address (location in memory) of the next instruction after the CPU finishes executing the current instruction in the instruction register
The value in the program counter will be increased after the CPU finishes executing one instruction

Processor Status Register (PSR)

This register contains the various information about the CPU.
Among the information contained in the PSR is

The result of a comparison operation

When the CPU compares 2 numbers a and a and b, the outcome of the comparison is stored in the PSR
The outcome of a compare operation will allow the CPU to determine the following fact between a and b.

Reference

[1] http://www.mathcs.emory.edu/~cheung/Courses/170/Syllabus/01/intro-computer2.html

Process Context Switch

Context Switch

Whenever an interrupt arrives, the CPU must to a state-save of currently running process, then switch into kernel mode to handle the interrupt, and the do a state-restore of the interrupted process.
Context switch happens when

the time slice for one process has expired and a new process is to be loaded from the ready queue.
This will be instigated by a timer interrupt, which will then cause the current process' state to be saved and the new process's state to be restored.
Saving and restoring states involves saving and restoring all registers and program counter(s), as well as the process control blocks.
Context switch happens VERY VERY frequently, and the overhead of doing the switching is just lost CPU time, so context switches (state saves and restores) need to be as fast as possible. Some hardware has specific provisions for speeding this up, such as a single machine instruction for saving or storing all registers at once.

Registers

The Operating System needs to store a copy of the CPU registers to memory.
When it is time for the processor to give up the processor so another process can run, it needs to save it's current state.
Equally, we need to be able to restore this state when the process is given more time to run on the CPU. To do this the operating system needs to store a copy of the CPU registers to memory.
When it is time for the process to run again, the operating system will copy the register values back from memory to the CPU registers and the process will be right back where it left off.

Reference

[1] http://www.bottomupcs.com/elements_of_a_process.xhtml

Process Heap

Introduction

The heap is an area of memory that is managed by the process for on the fly memory allocation.

This is for variables whose memory requirements are not known at compile time.

The bottom of the heap is known as the brk, so called for the system call which modifies it.
By using the brk call to grow the area downward the process can request the kernel allocate more for it to use.

malloc

The heap is mostly managed by the malloc library call.
This makes managing the heap for programmer by allowing them to simply allocate and free (via the free call) heap memory.
Malloc can use schemes like a buddy allocator to manage the heap memory for the user.
Malloc can also be smarter about allocation and potentially use anonymous mmaps for extra process memory.
This is where instead of mmaping a file into the process memory, it directly maps an area of system RAM.

Process Stack

Stacks usually grown down, i.e., the stack starts at a higher address in memory and progressively gets lower.

Stack Pointer

To keep track of current growth of the stack, the hardware defines a register as the stack pointer.
The compiler (or the programmer, when writing in assembler) users this register to keep track of current top of the stack.
Store

It is stored in the register ESP.

Call Stack

A call stack is used for several related purposes, but the main reason for have one is to keep track of the point to which each active subrountine should return control when it finishes executing.
Active subrountine

An active subrountine is one that has been called by yet to complete execution after which control should be handed back to the point of the call.

How it works?

Return address

Store the returning address somewhere, so that we can continue executing the current function after completion of called function.

Other info

Saving other information about current function.

Parameters

Providing the callee with the parameters

Provided called function some space to store its automic variables
Return value

Providing some mechanism to get return value from the called function.
It is implemented using EAX register. Called function stores return value in EAX.

Execution in Steps

Push the parameters in the reverse order (right to left)
Call function now

It implicitly pushes the return address into STACK

Push the current EBP

i.e., push Frame Pointer (FP) of calling function into stack. We need to do this to continue with the execution of calling function after return from the called function,

Copy the ESP to EBP

Yes, this location will be the new FRAME Pointer

Allocate space on stack for local variable

It is done by decrementing ESP

Put the value to be returned in EAX
Copy current EBP to ESP

It will reduce stack size. Now we have the old FP at the top of the stack??

Stack Frame

The stack frame generally includes the following components

The return address
Argument variables passed on the stack
Local variables
Saved copies of any registers modified by the subprogram that need to be restored.

Stack Overflow

Introduction
Suggestion

If you are a programmer accept arbitrary input into a stack variable (say, reading from the keyboard or over the network), you need to say how big that data is going to be.
OS

The OS can make the stack not executable.

Reference

[1] http://articles.manugarg.com/stack.html

Process Communication

Shared Memory Systems

In general, the memory to be shared in a shared-memory system is initially within the address space of a particular process, which needs to make system calls in order to make that memory publicly available to one or more other processes.
Other processes which wish to use the shared memory must then make their own system calls to attach the shared memory area onto their address space.
Generally a few messages must be passed back and forth between the cooperating processes first in order to set up and coordinate the shared memory access.

Example" POSIX Shared Memory

Step 1: one of the processes involved to allocate some shared memory, using shmget

int segment_id = shmget(IPC_PRIVATE, size, S_IRUSR | S_IWUSR)

The first parameter

specifies the key (identifier) of the segement. IPC_PRIVATE creates a new shared memory segment.

The second parameter

How big the shared memory segment is to be, in bytes.

The third parameter

A set of bitwise ORed flags. In this case, the segment is being created for reading and writing.

The return value

an integer identifier

Step 2: Any process which wishes to use the shared memory must attach the sahred memory to their address space, using shmat

char* shared_memory = (char *) shmat(segment_id, NILL, 0)

The first parameter

Specify the key of the segment that the process wishes to attack to its address space

The second parameter

Indicate where the process wishes to have the segment attacked. Null insicates the system should decide.

The third parameter

A flag for read-only operation. Zero indicates read-write; One indiciates Readonly.

The return value

The return value of shmat is a void *, which the process can use (type cast) as appropriae. In this example, it is being used as a character pointer.

Step 3: The process may access the memory using the pointer returned by shmat. For example, using sprintf

sprintf(shared_memory, "Writing to shared memory\n");

Step 4: When a process no longer needs a piece of shared memory, it can be detached using shmtd

shmat(shared_memory)

Step 5: And finally the process that originally allocated the shared memory can remove it from the system using shmctl.

shmctl(segment_id, IPC_RMID)

Message-Passing

Message passing systems must support at a minimum system calls for "send message" and "receive message"
A communication link must be established between the corresponding process before message can be sent.
There are three key issues to be resolved in message passing systems

Direct or indirect communication (naming)
Synchronous or asynchronous communication
Automatic or explicit buffering

Naming

Direct communication

With direct communication the sender must know the name of the receiver to which it wishes to send message

Indirect communication

Multiple processes can share the same mailbox or boxes.
Only one process can read any given message in a mailbox.
The OS must provide system calls to create and delete mailboxes, and to send and receive messages to/from mailboxes.

Synchronization

Either the sending or receiving of messages (or neither or both) may be either blocking or non-blocking.

Buffering

Messages are passed via queues, which may have one or three capacity configurations

Zero capacity

Message cannot be stored in the queue, so senders must lock until receivers accept the messages.

Bounded capacity

There is a certain pre-determined finite capacity in the queue. Senders must block if the queue if full, until space becomes available in the queue, but may be either blocking or non-blocking otherwise.

Unbounded capacity

The queue has a theoretical infinite capacity, so senders are never forced to block.

Reference

[1] https://www.cs.uic.edu/~jbell/CourseNotes/OperatingSystems/3_Processes.html

Process Memory, State, and Scheduling

Process Concept

A process is an instance of a program in execution

Process Memory

Text:

comprises the compiled program code, read in from non-volatile storage when the program is launched.

Data

Stores global and static variables, allocated and initialized prior to executing main.

Heap

Used for dynamic memory allocation, is managed via calls to new, delete, malloc, free etc.

Stack

Used for local variables.
Space on the stack is reserved for local variables when they are declared (at function entrance or elsewhere, depending on the language), and the space is freed up when the variables go out of scope
The stack is also used for function return values, and the exact mechanisms for stack management may be language specific.

When processes are swapped out of memory and later restored, additional information must also be stored and restored. Key among them are the program counter and the value of all program registers.

Process State

The process is in the stage of being created.

Ready

The process has all the resources available that it needs to run, but the CPU is not currently working on this process's instruction.

Running

The CPU is working on this process's instruction.

Waiting

The process cannot run at the moment, because it is waiting for some resource to become available for some event to occur. For example, the process may be waiting for keyboard input, disk access request, inter-process messages, a time to go off, or a child process to finish.

Terminated

The process has completed.

Process Control Block

For each process, there is a Process Control Block (PCB), which stores the following (types of) process-specific information.

Process State

Running, waiting etc.

Process ID

and also parent process ID

CPU registers and Program Counter

These need to be saved to restored when swapping processes in and out of the CPU.

CPU scheduling information

Such as priority information and pointers to scheduling queues

Memory-Management information

e.g., page tables or segment tables

Accounting information

User and kernel CPU time consumed, account numbers, limits, etc.

I/O Status information

Devices allocated, open file tables etc.

Process Scheduling

The two main objectives of the process scheduling system are to keep the CPu busy at all times and to deliver "acceptable" response times for all program, particularly for interactive ones.
The process scheduler must meet these objectives by implementing suitable policies for swapping process in and out of the CPU.

Scheduling Queues

Job queue

All processes are stored in the job queue.

Ready queue

Processes in the Ready state are placed in the ready queue

Device queue

Processes waiting for a device to be available or to deliver data are placed on device queues. There is generally a separate device queue for each device.

Scheduler

Long-term scheduler

Typically of a batch system or a very heavily loaded system. It runs infrequently, (such as when one process ends selecting one or more to be loaded in from disk in its place), and can afford to take the time to implement intelligent and advanced scheduling algorithms

Short-term scheduler, or CPU scheduler

Runs very frequently, on the order of 100 milliseconds, and must very quickly swap one process out of the CPU and swap in another one.

Medium-term scheduler

Some systems also employ medium-term scheduler.
When system loads get high, this scheduler will swap one or more processes out of the ready queue system for a few seconds, in order to allow smaller faster jobs to finish up quickly and clear the system.

Reference

[1] https://www.cs.uic.edu/~jbell/CourseNotes/OperatingSystems/3_Processes.html

Monday, January 18, 2016

Online Hiring Problem

Problem

If there are n candidates, and we do not want to interview all candidates to find the best one. We also do not wish to hire and fire as we find better and better candidates.

Instead, we are willing to settle for a candidate who is close to the best, in exchange of hiring exactly once.

For each interview we must either immediately offer the position to the applicant or immediately reject the applicant.

What is the trade-off between minimizing the amount of interviewing and maximizing the quality of the candidate hired?

Code

Analysis

We analysis in the following to determine the best value of k that maximize the probability we hire the best candidate. In the analysis, we assume the index starts with 1 (rather than 0 as shown in the code).

Let $B_i$ be the event that the best candidate is the $i$-th candidate.
Let $O_i$ be the event that none of the candidate in position $k+1$ to $i-1$ is chosen.
Let $S_i$ be the event that we hire the best candidate when the best candidate is in the $i$-th position.

We have $Pr(S_i ) = Pr (B_i \cup O_i) = Pr(B_i) \cdot Pr(O_i)$ since $B_i$ and $O_i$ are independent events.

$Pr(S_i) = \sum^n_{i=k+1} Pr(S_i)$
$ = \sum^n_{i=k+1} \frac{1}{n} \cdot \frac{k}{n-i}$
$ = \frac{k}{n} \sum^n_{i=k+1} \frac{1}{i-1}$
$ \leq \frac{k}{n} \cdot (\ln n - \ln k)$

Setting this derivative equal to 0, we see that we maximize the probability, i.e., when $k = n/e$, we have the probability at least $1/e$.

Reference
[1] Introduction to Algorithms, CLSR

Instruction Set Randomization

Motivation

The attackers can inject instructions to the executables.

Definition

Instead of executing the executable directly, we add randomizations to the executable, and de-randomize when execute.

Randomization

Use a key to XOR with the instruction, and XOR again when execute

Reference

[1] https://www.youtube.com/watch?v=ZgNBjwXTrqA

Buffer Overflow

Definition

A buffer overflow occurs when a program or process tries to store more data in a buffer than it was intended to hold. Since buffers are created to contain a finite amount of data, the extra information -- which has to go somewhere -- can overflow into adjacent buffers, corrupting or overwriting the valid data held in them.

In buffer overflow attacks, the extra data may contain codes designed to trigger specific actions.

Reference

[1] http://searchsecurity.techtarget.com/definition/buffer-overflow

Address Space Layout Randomization (ASLR)

Definition

Address space layout randomization (ASLR) is a memory-protection process for operating systems to guards against buffer-overflow attacks by randomizing the location where system executables are loaded into memory.

Objective

The success of many cyberattacks, particularly zero-day exploits, relies on the hacker's ability to know or guess the position of processes and functions in memory.

ASLR is able to put address space targets in unpredictable locations. If an attacker attempts to exploit in an incorrect address space location, the target application will crash, stopping the attack and alerting the system.

Current Deployments

ASLR was created by the Pax Project as a Linux patch in 2001.

It was integrated into the Windows operating system beginning with Vista in 2007. Prior to ASLR, the memory locations of files and applications were either known or easily determined.
Adding ASLR to Vista increasing the number of possible address space locations to 256, meaning attackers only have a 1 in 256 chance of finding the correct location to execute code.
Apple began including ASLR in MAC OS X 10.5 Leopard, and Apple iOS and Google Andriod both using ASLR in 2011.

Reference

[1] http://searchsecurity.techtarget.com/definition/address-space-layout-randomization-ASLR

Sunday, January 17, 2016

HoneyPots

Introduction

[1] Deception is a mechanism that attempts to distort or mislead an attacker into taking a course of action that is more suited to the goals of the defender.

A common deception defense is the use of network honeypots.

A honeypot is a commuter system that is designed to be a trap for unauthorized accesses.

Honeypots are deployed within a network to appear like normal, active systems to an outsider.

How to build honeypots

Mimicking

One of the deception technique is mimicking. A honeypot attempts to mimic a real system to fool the adversary into probing and/or attacking it.
The amount of interaction the honeypots respond to queries with information that represents a possible system within the infrastructure but unlike a normal system, it maintains a very detailed logs of all interactions. From these detailed logs, administrators can gain insight into an attacker's goal and methods as well as put in place other measures to hopes of preventing an attack.

Reference

[1] Probabilistic Performance Analysis of Moving Target and Deception Reconnaissance Defenses, by Michael Crouse, in MTD15

Wednesday, January 13, 2016

Cache in Virtualized Environments

System Model for a Multi-core Processor

Cache Hierarchy

Reference [1]

Because of the long access time of main memory compared to fast processors, smaller but faster memory, called cache, are used to reduce the effective memory time as seen by a processor.

Modern processors feature a hierarchy of caches.

"Higher-level" caches, which are closer the processor core are smaller but faster than lower-level caches, which are closer to main memory.

L1 caches

Each core typically has two private top level caches

1) one for data
2) one for instructions

A typical L1 cache size is 32KB with 4-cycle access time, as in Intel Core and Xeon families.

LLC

The LLC is shared among all cores of a multicore chip and is a unified cache, i.e., it holds both data and instructions. [1]

LLC sizes measure in megabytes, and access latencies are of the order 40 cycles.

L2 caches

Modern X86 processors typically also support core-private, unified L2 caches of intermediate size and latency.

How it works

Any memory access first access the L1 cache, and on a miss, the request is sent down the hierarchy until it hits in a cache or accesses main memory.

The L1 is typically indexed by virtual address. While all other caches are indexed by physical address.

Cache Access

Reference [1]

To exploit spatial locality, caches are organized in fixed-size lines, which are the units of allocation and transfer down the cache hierarchy.

A typical line size B is 64bytes.

The log2B lowest-order bits of the address, called the line offset, are used to locate a datum in the cache line.

Set-associate

Caches today are usually set-associate, i.e., organized as S sets of W lines each, called a W-way set-associative cache. As shown in the following figure.[1]

When the cache is accessed, the set index field of the address, log2S consecutive bits starting from bit log2B is used to locate a cache set. The remaining high-order bits are used as a tag for each cache line.

After locating the cache set, the tag field of the address is matched against the tag of the W lines in the set to identify if one of the cache line is a cache hit.

Cache Replacement

As memory is much larger than the cache, more than W memory lines may map to the same cache set, potentially resulting in cache contention. If an access misses in the cache, and all lines of the matching set are in use, one cache line must be evicted to free cache slot for the new cache line being fetched from the next level of cache or from main memory for the LLC. The cache's replacement policy determines the line to evict. Typically replacement policies are approximations to least-recently-used (LRU). [1]

Traditional Cache

Per-Core Slice Cache

[1] Modern Intel processors, starting with the Sandy Bridge microarchitecture, use a more complex architecture for the LLC, to improve its performance.

The LLC is divided into pre-core slices, which are connected by a ring bus. Slices can be accessed concurrently and are effective separate caches, although the bus ensures that each core can access the full LLC (with higher latency for remote slices).

Sliced Cache

Ring bus architecture and sliced LLC

Reference

[1] Last-Level Cache Side-Channel Attacks are Practical, by Fangfei Liu et al, in Security&Privacy 2015

Virtual Address Space in Virtualized Environments

Address Mapping

Reference [1]

In virtualized environments there are two levels of address-space virtualization.

1) the virtual addresses of a process -> guest's notion of physical address, i.e., the VM's emulated physical memory

2) guest physical addresses -> physical addresses of the processor

Page Translation

Reference [1]

Translations from virtual pages to physical frames are stored in page tables.

Processor cache recently used page table entries in the translation look-aside-buffer (TLB).

The TLB is scarce processor resource with a small number of entries.

Large pages used the TLB more efficiently, since fewer entries are needed to map a particular region of memory. As a result, the performance of application with large memory footprint, such as Oracle database or high-performance computing applications, can benefit from using large pages.

For the same reason, VMMs, such as VMWare ESX and Xen HVM, also use large pages for mapping guest physical memory.

Reference

[1] Last-Level Cache Side-Channel Attacks are Practical, by Fangfei Liu et al, in Security&Privacy 2015

Covert Channel Attack

Introduction

[1] Covert channels and side channels are two types of information leakage channels. A covert channel uses mechanisms that are not intended for communications, e.g., writing and checking if a file is locked to convey a "1" or "0".

In a covert channel, an insider process leaks information to an outsider process not normally allowed to access that information. The insider (sending) process could be a Trojan horse program previously inserted stealthily into the computer. An outsider (receiver) process need only to be an unpriviledged process.

Reference

[1] Covert and Side Channels due to Processor Architecture, by Zhenghong Wang and Ruby B. Lee, in ACSAC 2006

Tuesday, January 12, 2016

Network Topology in the Cloud -- ToR

Top of Rack

[1] demonstrates that most of the topology in Amazon Cloud is ToR mode.

All servers in a track are first connected to a separate Top of Rack (ToR) switch, and then the ToR switch is connected to aggregate swtiches.
Such a topology has currently become a mainstream network topology in a data center.

Reference

[1] A Measurement Study on Co-residence Threat inside the Cloud, by Zhang Xu, Haining Wang and Zhenyu Wu, in UsenixSecurity2016

Virtual Private Cloud (VPC)

VPC Introduction

[1] VPC is a logically isolated networking environment that a separate private IP space and routing configuration.

Characteristics

After creating a VPC, a customer can launch instances into VPC, instead of the large EC2 network pool.
The customer can also divide a VPC into multiple subnets, where each subnet can have a preferred availability zone to place instances.
The private IP address of an instance in VPC is only known to its owner. It cannot be detected by other users. Thus, it can significantly reduces the threat of co-residence.

Reference

[1] A Measurement Study on Co-residence Threat inside the Cloud, by Zhang Xu, Haining Wang and Zhenyu Wu, in UsenixSecurity2016

IP Scan

Tools

ZMap

Ports

FTP: 20,21
SSH: 22
Telnet: 23
SMTP: 25, 587
WHOIS: 43
DNS: 53
DHCP: 68
Finger Protocol: 79
HTTP: 80
SQL: 118
HTTPS: 443
MySQL: 3306

Monday, January 11, 2016

Ads Key Word Revenue Prediction

Sunday, January 10, 2016

Transactions and Replications

One Copy Serializability

Replicated transactional service

Each replica manager providers concurrency control and recovery of its own data items in the same way as it would for non-replicated data

Effects of transactions performed by various clients on replicated data items are the same as if they has been performed one at a time on a single data item
Additional complications

Failures should be serialized w.r.t. transactions, i.e., any failure observed by a transaction must appear to have happened before a transaction started

Replication Scheme

Primary copy

All client request are directed to a single primary server

Read one -- write all

cannot handle network partitions
Each write operation sets a write lock to each replica manager
Each read sets a read lock at one replica manager

Schemes that can handle network partitions

Available copies with validation
Quorum consensus
Virtual Partition

Available Copies Replication

Can handle the case when some replica managers are unavailable because they failed or communication failure
Reads can be performed by any available replica manager but writes must be performed by all available replica managers
Normal case is like one/write all

As long as the set of available replica managers does not change during a transaction

RM failure case

One copy serializability requires that failures and recovery be serialized w.r.t transactions
This is not achieved when different transactions make conflicting failure observations
Examples shows local concurrency control is not enough
Additional concurrency control procedure (called local validation) has to be performed to ensure correctness

Available copies with local validation assume no network partition

i.e, functioning replica managers can communicate with one another.

Local validation

Before a transaction commits it checks for failures (and recoveries) of replica managers of data items it has accessed

Handling Network Partititons

Network partitions separate replica managers into two or more subgroups, in such a way that the members of a group can communicate with one another but members of different subgroups cannot communicate
Optimistic approaches

Available copies with validation

Pessimistic approaches

Quorum consensus

Available Copies with Validation

Available copies algorithm applied within each partition

Maintains availability for read operations

When partition is repaired, possibly conflicting transactions in separate partitions are validated

The effects of a committed transactions that is now aborted on validation will have to be undone

Only feasible for applications where such compensating actions can be taken

Validation

Versions vector (write-write conflicts)
Precedence graphs (each partition maintains a log of data items affected by the Read and Write operations of transactions)
Log used to construct a precedence graph whose nodes are transactions and whose edges represent conflicts between Read and Write operations

No cycle in graph corresponding to each partition

If there are cycles in graph, validation fails.

Quorum Consensus

A quorum is a subgroup of replica managers whose size gives it the right to carry out operations
Majority voting one instance of a quorum consensus scheme

R+ W > total number of votes in group
W > half f the total votes
Ensure that each read quorum intersects a write quorum, and two write quorum will intersect

Each replica has a version number that is used to detect if the replica is up to date.

Virtual Partition Scheme

Combines available copies and quorum consensus
Virtual partition = set of replica managers that have a read and write quorum
If a virtual partition can be formed, available copies is used

Improve performance of Reads

If a failure occurs, and virtual partition changes during a transaction, it is aborted
Have to ensure virtual partitions do not overlap.

CAP Conjecture

Is it possible to achieve consistency, availability, and partition tolerance?.
Classic distributed systems: focused on ACID semantics

Atomic
Consistent
Isolated
Durable

Modern internet system: focused on BASE

Basically Available
Soft-state (or scalable)
Eventually consistent

ACID v.s. BASE

Why the Divide

What goals might you want for a shared-data system

consistency
availability
partition tolerance

Strong consistency

all clients see the same view, even in the presence of updates

High availability

All clients can find the same replica of the data, even in the presence of failures

Partition-tolerance

The system properties hold even when the system is partitioned.

CAP Conjecture

You can only have two out of these three properties.
The choice of which feature to discard determines the nature of your system.

Consistency and Availability

Comment

Providing transaction semantics requires all nodes to be in contact with each other

Example

Single-site clustered databases

Typical features

Two-phase commit
Cache invalidation protocol
Classic distributed system style

Consistency and Partition Tolerance

Comment

If one is willing to tolerate system-wide blocking, then can provide consistency even when there are temporary partitions

Examples

Distributed databases
Distributed locking
Quorum (majority) protocols

Typical features

Pessimistic locking
Minority partitions unavailable
Also common distributed system

voting vs. primary replicas

Partition-Tolerance and Availability

Comment

sacrifice consistency

Examples

DNS
Web cache
Coda
Bayou

Typical features

TTLs and lease cache management
Optimistic updating with conflict resolution

Techniques

Expiration-based caching: AP
Quorum/majority algorithm: PC
Two phase commit: AC

Saturday, January 9, 2016

Cache-based Covert Channel

Background

How to measure cache usage

Cache-based Covert Channel

[1] Cache load measurements create very effective covert channels between cooperating processes running in different VMs.

In practise, this is not a major threat for current deployments since in most cases the cooperating processes can simply talk to each other over a network.

However, covert channels become significant when communication is forbidden by information flow control (IFC) mechanisms such as sandboxing and IFC kernels. The latter is a promissing emerging approach to improving security (e.g., web-server functionality).

[1] explains more on the covert-channel in Section 8.1.

Reference

[1] Hey, You, Get Off of My Cloud: Exploring Information Leakage in Third-Party Compute Clouds, by Restenpart, T. et al., in CCS09

How to Measure Cache Usage

Prime + Trigger + Probe

[1] demonstrates how to utilize the Prime + Probe technique to measure the cache activity, and extend it to the following Prime + Trigger + Probe measurement to support the setting of time-shared virtual machines.

The probing instance first allocates a continuous buffer B of b bytes.

Here b should be large enough that a significant portion of the cache is filled by B.

Let s be the cache line size, in bytes.
Then the probing instance performs the following steps to generate each load sample

Prime: Read B at s-byte offset in order to ensure it is cached.
Trigger: Busy-loop until the CPU's cycle counter jump by a large value

This means our VM was preempted by the Xen scheduler, hopefully in favor of the sender VM.

Probe: Measure the time it takes to again read B at s-byte offsets.

When reading b/s memory locations in B, we use a pseudorandom order, and the pointer-chasing technique to prevent the CPU's hardware prefetcher from hiding the latencies.

The time the of the final step's read is the load sample, measured in number of CPU cycles. These laod samples will be strongly correlated with use of the cache during the trigger step, since last usage will evict some portion of the buffer and thereby drive up the read time during the probe phase.

Reference

[1] Hey, You, Get Off of My Cloud: Exploring Information Leakage in Third-Party Compute Clouds, by Restenpart, T. et al., in CCS09

Threat of Covert Channel Attacks

Summary

Attackers can build various side channels to circumvent the logical isolation in cloud physical machines, and obtain sensitive information from co-resident VMs

Coarse-grained, e.g., workloads and web traffic rates

Since the cache utilization rate has a large impact on the execution time of the cache read operation, attackers can infer the victim's cache usage and workload information, by applying the Prime+Probe technique.
Similarly, they can estimate the victim's web traffic rate, which also has a strong correlation with the execution time of cache operations. [2]
[1] demonstrate a clear correlation between a victim's web traffic rate with the load sample.

Fine-grained, e.g., cryptographic keys.

Attackers can exploit shared hardware resources, such as the instruction cache, to extract cryptographic keys. Specifically, the following challenges are overcomed

Dealing with core migrations and determining if an observation is associated with the victim
Filtering out hardware and software noise, and regaining access to the target CPU core with sufficient frequency

For clever attackers, even seemingly innocuous information like workload statistics can be useful.

For example, such data can be used to identify when the system is most vulnerable, i.e., the time to launch further attacks, such as Denial of Service attacks. [9]

Reference

[1] Hey, You, Get Off of My Cloud: Exploring Information Leakage in Third-Party Compute Clouds, by Restenpart, T. et al., in CCS09

[2] Using Virtual Machine Allocation Policies to Defend against Co-resident Attacks in Cloud Computing, by Yi Han et al, in Transactions on Dependable and Secure Computing

Web Applications Client & Server

Figure 1 [1] illustrates the web application architecture in the server side and client side.

Server Side

Logic Layer

Implements the application business logic using high-level programming languages, such as Java, PHP, or Python.

Web Server Layer

Receive HTTP request, and passes the request to the appropriate server-side program, e.g., Apache web server, Windows IIS, or Nginx.

Data Storage Layer

Stores the web application state and user data. Popular data storage systems are traditional SQL databases, which include MySQL, PorsgreSQL, or MSSQL

Infrastructure Layer

Runs the operating systems. An infrastructure could be a physical machine or virtualization platform which manages multiple virtual machines.

Client Side

The client side receives HTTP response from the server-side, and the job of the client is to convert the HTML contained in the HTTP response into a graphical interface from the user.

Logic Layer (Presentation Layer)

It is written in a combination of HTML, CCS, and JavaScript, with JavaScript providing a way for the sever-side code to execute application logic on the client

Browser

Retrieves the presentation layer code from the server, interprets it, and presents it as a graphic interface to the user.

Storage Layer

For the presentation layer code to store data. Available storage methods include cookies, local storage, IndexedDB, and File APIs.

Operating System Layer

Runs the browser

Reference
[1] Toward a Moving Target Defense for a Web Applications, by Marhony Taguinod, in International Conference on Information Reuse and Integration 2015

Web Application Moving Target Defense

Motivation

Web application remains the most popular way for businesses to provide services over the Internet. It is the "front door" of many companies, and therefore their security is of paramount importance.
Example [1]

JPMorgan Chase breach in 2014 affected 76 million US households
Bloomberg reported that the hackers exploited an overlooed falow in one of the bank's webistes.

The efforts in discovering and fixing vulnerabilities are not enough to protect web applications for many reasons [1]

The increasing complexity of modern web applications brings inevitable risks that cannot be fully mitigated in the process of web application development and deployment
Attackers can take their time, to understand the web application's functionality and technology stack, before launching an attack

Background

Web application technology

Objective

Ref [1]
The key issue is to design MTD mechanism that

Prevent or disrupt a vulnerability or exploit
While still providing the identical functionality

Issues

Choose what component to move in a web application
Decide the optimal time and how often to move components

Proposals

Change the server-side language used in a web application

automatically translating server-side web application code to another language in order to prevent Code Injection exploits

Shifts the Database used in a web application

transform the backend SQL database into different implementations that speak different dialects in order to prevent SQL Injection exploits.

Reference

[1] Toward a Moving Target Defense for Web Applications, by Marthony Taguinod et al., in International Conference on Information Reuse and Integeration 2015

Friday, January 8, 2016

Fault Tolerance

Definitions

Availability

Probability the system operates correctly at any given moment

Reliability

Ability to run correctly for a long interval of time

Safety

Failure to operate correctly does not lead to catastrophic failures

Maintainability

Ability to "easily" repair a failed system

Failure Models

Different types of failures

Two-Army Problem

Byzantine Agreement

[Lamport et al. (1982)]
Goal

Each process learn the true values sent by correct processes

Assumption

Every message that is sent is delivered correctly
The receiver knows who sent the message
Message delivery time is bounded

Byzantine Agreement Result

In a system with m faulty processes agreement, agreement can be achieved only if there are 2m+1 functioning correctly.
Note

This result only guarantees that each process receives the true values sent by correct processors, but it does not identify the correct process!

Byzantine General Problem

Phase 1: Generals announce their troop strengths to each other

Phase 2: Each general construct a vector with all troops

Phase 3: General send their vectors to each other and compute the majority voting

Reliable Group Communication

Reliable Multicast

All nonfaulty process which do not join/leave during communication receive the message

Atomic Multicast

All message are delivered in the same order to all processes

View Delivery

A view reflects current membership of group
A view is delivered when a membership change occurs and the application is notified of the change

Receiving a view is different from delivering a view

All members have to agree to the delivery of a review

View synchronous group communication

The delivery of a new view draws a conceptual line across the system and every message is either delivered on one side or the other of that line

Atomic Multicast

All message are delivered in the same order to "all" processes
Group view

The set of processes known by the sender when it multicast the message

Virtual synchronous multicast

A message multicast to a group view G is delivered to all nonfaulty process in G

If sender fails after sending the message, the message may be delivered to no one

Virtual Synchrony Implementation

Only stable messages are delivered
Stable message

A message received by all processes in the message's group view

Assumptions (can be ensured by using TCP)

Point-to-point communication is reliable
Point-to-point communication ensures FIFO-ordering

Message Ordering

Total ordering does not imply causality or FIFO!

[Hackerrank] Grid Challenge

Problem

Given a squared sized grid

G of size

N in which each cell has a lowercase letter. Denote the character in the

ith row and in the

jth column as

G[i][j].

You can perform one operation as many times as you like: Swap two column adjacent characters in the same row

G[i][j] and

G[i][j+1] for all valid

i,j.

Is it possible to rearrange the grid such that the following condition is true?

G[i][1]≤G[i][2]≤⋯≤G[i][N] for

1≤i≤N and

G[1][j]≤G[2][j]≤⋯≤G[N][j] for

1≤j≤N

In other words, is it possible to rearrange the grid such that every row and every column is lexicographically sorted?

Note:

c1≤c2, if letter

c1 is equal to

c2 or is before

c2 in the alphabet.

Input Format

The first line begins with

T, the number of testcases. In each testcase you will be given

N. The following

N lines contain

N lowercase english alphabet each, describing the grid.

Output Format

T lines. On the

ith line print YES if it is possible to rearrange the grid in the

ith testcase or NO otherwise.

Constraints

1≤T≤100

1≤N≤100

Gij will be a lower case letter

Sample Input

1
5
ebacd
fghij
olmkn
trpqs
xywuv

Sample Output

YES

Analysis

Sort each row
Then check each column to check whether numbers are in increasing order or not

Saturday, January 30, 2016

Adaptation of Adversaries [1]

Solutions

Reference

Monday, January 25, 2016

High-speed Covert channel Attacks in the Cloud

Friday, January 22, 2016

Procedure

Mode Bit

Switch

Escalate Privileges

Advantage

Thursday, January 21, 2016

Device Controller

Device Driver

Interrupt Driver IO

Directed Memory Access (DMA)

Introduction

Return Address

Bootstrap

Bit

Byte

Word

KiloByte

MegaByte

GigaByte

Wednesday, January 20, 2016

Definition

Implementation

Cluster of Servers

Client Side Web Sessions

Issues

HTTP Session Token

Introduction

Algorithm

Reference

Components in a CPU

General purpose Registers

Arithmetic and Logic Unit (ALU)

Instruction Register (IR)

Program Counter (PC)

Processor Status Register (PSR)

Reference

Context Switch

Registers

Reference

Introduction

Stack Pointer

Call Stack

How it works?

Execution in Steps

Stack Frame

Stack Overflow

Reference

Shared Memory Systems

Example" POSIX Shared Memory

Message-Passing

Naming

Synchronization

Buffering

Reference

Process Concept

Process Memory

Process State

Process Control Block

Process Scheduling

Scheduling Queues

Scheduler

Reference

Monday, January 18, 2016

Problem

Code

Analysis

Motivation

Definition

Randomization

Reference

Definition

Reference

Definition