Tag Archives: applications

[repost ]IBM’s Corelet Language: Programming Like the Human Brain

original:http://www.developer.com/lang/other/ibms-corelet-language-programming-like-the-human-brain.html

No stranger to creating new programming languages and paradigm, IBM has embarked on an effort to complete a computer language that enables programmers to build applications that work like the human brain.

Over the years, IBM has invented languages such as Fortran, RPG and a host of others. Now, IBM Research has created a new programming model to supportchips that mimic the workings of the human brain, known as Systems of Neuromorphic Adaptive Plastic Scalable Electronics, or SyNAPSE, chips.

In a project funded by the Defense Advanced Research Projects Agency (DARPA), IBM on Aug. 8 announced a breakthrough software ecosystem designed for programming silicon chips that have an architecture inspired by the function, low power and compact volume of the brain. The technology could enable a new generation of intelligent sensor networks that mimic the brain’s perception, action and cognition abilities, according to IBM.

The company’s long-term goal is to build a chip system with 10 billion neurons and a hundred trillion synapses, while consuming only 1kilowatt of power and occupying less than two liters of volume, Big Blue said.

To get there, IBM researchers had to deliver not only new hardware, but also a new software paradigm. The vendor said the new programming model is dramatically different from traditional software:IBM’s new programming model breaks the mold of sequential operation underlying today’s von Neumann architectures and computers. Instead, it is tailored for a new class of distributed, highly interconnected, asynchronous, parallel, large-scale cognitive computing architectures.

“Architectures and programs are closely intertwined, and a new architecture necessitates a new programming paradigm,” said Dr. Dharmendra Modha, principal investigator and senior manager for the project in IBM Research. “We are working to create aFORTRAN for synaptic computing chips. While complementing today’s computers, this will bring forth a fundamentally new technological capability in terms of programming and applying emerging learning systems.”

Modha said IBM had to come up with a new programming language for its new non-von Neumann cognitive computing system architecture, known as TrueNorth. He explained that today’s computers are left-brained and SyNAPSE is right-brained. Being left-brained refers to being more analytical, logical and objective, but SyNAPSE is more intuitive, thoughtful and subjective—all right-brained characteristics. So the sequential programming paradigm is unsuited for SyNAPSE

The new programming model consists of a high-level description of a “program” that is based on composable, reusable building blocks called “corelets .” Each corelet represents a complete blueprint of a network of neurosynaptic cores that specifies a based-level function.

Inner workings of a corelet are hidden, so only its external inputs and outputs are exposed to other programmers, who can concentrate on what the corelet does, rather than how it does it. Corelets can be combined to produce new corelets that are larger, more complex or have added functionality.

IBM also has created an object-oriented Corelet language for creating, composing and decomposing corelets. This language enables the construction of complex cognitive algorithms and applications, while being efficient for TrueNorth and effective for programmer productivity.

IBM said object-oriented programming (OOP) is ideal for implementing corelets, and the Corelet language includes fundamental features of OOP, including encapsulation, inheritance and polymorphism.

“Defining a corelet as a class in an OOP framework grants us encapsulation, inheritance and polymorphism, and dramatically improves the design, structure, modularity, correctness, consistency, compactness and reusability of code,” Modha and a team of IBM researchers wrote in a paper on the language.

The research team also said IBM has implemented the Corelet language using MATLAB OOP, “which has the additional advantage of being a compact language for matrix and vector expressions and computations.”

In addition, IBM has delivered a Corelet library that acts as an ever-growing repository of reusable corelets from which programmers can compose new corelets. The library is a repository of consistent, verified, parameterized, scalable and composable functional primitives, according to the company.

To boost programmer productivity, IBM has designed and implemented a repository of more than 100 corelets in less than one year. Every time a new corelet is written—either from scratch or by composition—it can be added back to the library, which keeps growing in a self-reinforcing way.

Finally, IBM also constructed a Corelet laboratory. This programming environment integrates with the TrueNorth architectural simulator, Compass, to support all aspects of the programming cycle—from design, through development, debugging and up to deployment.

In one long, expressive sentence, the research team summed up the Corelet language, the overall SyNAPSE environment and the move to cognitive computing: “The value of the new paradigm to the programmer is freedom from thinking in terms of low-level hardware primitives; availability of tools to design at the functional level; ability to use a divide-and-conquer strategy in the process of creating and verifying individual modules separately; a new way of thinking in terms of simple modular blocks and their hierarchical composition, rather than having to deal with an unmanageably large network of neurosynaptic cores directly; guaranteed implementability on TrueNorth; ability to verify correctness, consistency and completeness; ability to reuse code and components; ease of large-scale collaboration; ability to configure more neurosynaptic cores per line of code and unit of time; access to an end-to-end environment for creating, compiling, executing and debugging; and the ability to use the same conceptual metaphor across functional blocks that range from a handful of synapses and neurons to networks of neurosynaptic cores with progressively increasing size and complexity.”

Meanwhile, as IBM researchers look at delivering more and more complex, large-scale TrueNorth programs, they are currently extending the programming paradigm using MATLAB’s Parallel Computing Toolbox.

[repost ]Hadoop MapReduce Next Generation – Writing YARN Applications

original:http://hadoop.apache.org/docs/r2.0.2-alpha/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html

Purpose

This document describes, at a high-level, the way to implement new Applications for YARN.

Concepts and Flow

The general concept is that an ‘Application Submission Client’ submits an ‘Application’ to the YARN Resource Manager. The client communicates with the ResourceManager using the ‘ClientRMProtocol’ to first acquire a new ‘ApplicationId’ if needed via ClientRMProtocol#getNewApplication and then submit the ‘Application’ to be run via ClientRMProtocol#submitApplication. As part of the ClientRMProtocol#submitApplication call, the client needs to provide sufficient information to the ResourceManager to ‘launch’ the application’s first container i.e. the ApplicationMaster. You need to provide information such as the details about the local files/jars that need to be available for your application to run, the actual command that needs to be executed (with the necessary command line arguments), any Unix environment settings (optional), etc. Effectively, you need to describe the Unix process(es) that needs to be launched for your ApplicationMaster.

The YARN ResourceManager will then launch the ApplicationMaster (as specified) on an allocated container. The ApplicationMaster is then expected to communicate with the ResourceManager using the ‘AMRMProtocol’. Firstly, the ApplicationMaster needs to register itself with the ResourceManager. To complete the task assigned to it, the ApplicationMaster can then request for and receive containers via AMRMProtocol#allocate. After a container is allocated to it, the ApplicationMaster communicates with the NodeManager using ContainerManager#startContainer to launch the container for its task. As part of launching this container, the ApplicationMaster has to specify the ContainerLaunchContext which, similar to the ApplicationSubmissionContext, has the launch information such as command line specification, environment, etc. Once the task is completed, the ApplicationMaster has to signal the ResourceManager of its completion via the AMRMProtocol#finishApplicationMaster.

Meanwhile, the client can monitor the application’s status by querying the ResourceManager or by directly querying the ApplicationMaster if it supports such a service. If needed, it can also kill the application via ClientRMProtocol#forceKillApplication.

Interfaces

The interfaces you’d most like be concerned with are:

  • ClientRMProtocol – Client<–>ResourceManager
    The protocol for a client that wishes to communicate with the ResourceManager to launch a new application (i.e. the ApplicationMaster), check on the status of the application or kill the application. For example, a job-client (a job launching program from the gateway) would use this protocol.
  • AMRMProtocol – ApplicationMaster<–>ResourceManager
    The protocol used by the ApplicationMaster to register/unregister itself to/from the ResourceManager as well as to request for resources from the Scheduler to complete its tasks.
  • ContainerManager – ApplicationMaster<–>NodeManager
    The protocol used by the ApplicationMaster to talk to the NodeManager to start/stop containers and get status updates on the containers if needed.

Writing a Simple Yarn Application

Writing a simple Client

  • The first step that a client needs to do is to connect to the ResourceManager or to be more specific, the ApplicationsManager (AsM) interface of the ResourceManager.
        ClientRMProtocol applicationsManager; 
        YarnConfiguration yarnConf = new YarnConfiguration(conf);
        InetSocketAddress rmAddress = 
            NetUtils.createSocketAddr(yarnConf.get(
                YarnConfiguration.RM_ADDRESS,
                YarnConfiguration.DEFAULT_RM_ADDRESS));             
        LOG.info("Connecting to ResourceManager at " + rmAddress);
        configuration appsManagerServerConf = new Configuration(conf);
        appsManagerServerConf.setClass(
            YarnConfiguration.YARN_SECURITY_INFO,
            ClientRMSecurityInfo.class, SecurityInfo.class);
        applicationsManager = ((ClientRMProtocol) rpc.getProxy(
            ClientRMProtocol.class, rmAddress, appsManagerServerConf));
  • Once a handle is obtained to the ASM, the client needs to request the ResourceManager for a new ApplicationId.
        GetNewApplicationRequest request = 
            Records.newRecord(GetNewApplicationRequest.class);              
        GetNewApplicationResponse response = 
            applicationsManager.getNewApplication(request);
        LOG.info("Got new ApplicationId=" + response.getApplicationId());
  • The response from the ASM for a new application also contains information about the cluster such as the minimum/maximum resource capabilities of the cluster. This is required so that to ensure that you can correctly set the specifications of the container in which the ApplicationMaster would be launched. Please refer to GetNewApplicationResponse for more details.
  • The main crux of a client is to setup the ApplicationSubmissionContext which defines all the information needed by the ResourceManager to launch the ApplicationMaster. A client needs to set the following into the context:
    • Application Info: id, name
    • Queue, Priority info: Queue to which the application will be submitted, the priority to be assigned for the application.
    • User: The user submitting the application
    • ContainerLaunchContext: The information defining the container in which the ApplicationMaster will be launched and run. The ContainerLaunchContext, as mentioned previously, defines all the required information needed to run the ApplicationMaster such as the local resources (binaries, jars, files etc.), security tokens, environment settings (CLASSPATH etc.) and the command to be executed.
        // Create a new ApplicationSubmissionContext
        ApplicationSubmissionContext appContext = 
            Records.newRecord(ApplicationSubmissionContext.class);
        // set the ApplicationId 
        appContext.setApplicationId(appId);
        // set the application name
        appContext.setApplicationName(appName);
    
        // Create a new container launch context for the AM's container
        ContainerLaunchContext amContainer = 
            Records.newRecord(ContainerLaunchContext.class);
    
        // Define the local resources required 
        Map<String, LocalResource> localResources = 
            new HashMap<String, LocalResource>();
        // Lets assume the jar we need for our ApplicationMaster is available in 
        // HDFS at a certain known path to us and we want to make it available to
        // the ApplicationMaster in the launched container 
        Path jarPath; // <- known path to jar file  
        FileStatus jarStatus = fs.getFileStatus(jarPath);
        LocalResource amJarRsrc = Records.newRecord(LocalResource.class);
        // Set the type of resource - file or archive
        // archives are untarred at the destination by the framework
        amJarRsrc.setType(LocalResourceType.FILE);
        // Set visibility of the resource 
        // Setting to most private option i.e. this file will only 
        // be visible to this instance of the running application
        amJarRsrc.setVisibility(LocalResourceVisibility.APPLICATION);          
        // Set the location of resource to be copied over into the 
        // working directory
        amJarRsrc.setResource(ConverterUtils.getYarnUrlFromPath(jarPath)); 
        // Set timestamp and length of file so that the framework 
        // can do basic sanity checks for the local resource 
        // after it has been copied over to ensure it is the same 
        // resource the client intended to use with the application
        amJarRsrc.setTimestamp(jarStatus.getModificationTime());
        amJarRsrc.setSize(jarStatus.getLen());
        // The framework will create a symlink called AppMaster.jar in the 
        // working directory that will be linked back to the actual file. 
        // The ApplicationMaster, if needs to reference the jar file, would 
        // need to use the symlink filename.  
        localResources.put("AppMaster.jar",  amJarRsrc);    
        // Set the local resources into the launch context    
        amContainer.setLocalResources(localResources);
    
        // Set up the environment needed for the launch context
        Map<String, String> env = new HashMap<String, String>();    
        // For example, we could setup the classpath needed.
        // Assuming our classes or jars are available as local resources in the
        // working directory from which the command will be run, we need to append
        // "." to the path. 
        // By default, all the hadoop specific classpaths will already be available 
        // in $CLASSPATH, so we should be careful not to overwrite it.   
        String classPathEnv = "$CLASSPATH:./*:";    
        env.put("CLASSPATH", classPathEnv);
        amContainer.setEnvironment(env);
    
        // Construct the command to be executed on the launched container 
        String command = 
            "${JAVA_HOME}" + /bin/java" +
            " MyAppMaster" + 
            " arg1 arg2 arg3" + 
            " 1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout" +
            " 2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr";                     
    
        List<String> commands = new ArrayList<String>();
        commands.add(command);
        // add additional commands if needed                
    
        // Set the command array into the container spec
        amContainer.setCommands(commands);
    
        // Define the resource requirements for the container
        // For now, YARN only supports memory so we set the memory 
        // requirements. 
        // If the process takes more than its allocated memory, it will 
        // be killed by the framework. 
        // Memory being requested for should be less than max capability 
        // of the cluster and all asks should be a multiple of the min capability. 
        Resource capability = Records.newRecord(Resource.class);
        capability.setMemory(amMemory);
        amContainer.setResource(capability);
    
        // Set the container launch content into the ApplicationSubmissionContext
        appContext.setAMContainerSpec(amContainer);
  • After the setup process is complete, the client is finally ready to submit the application to the ASM.
        // Create the request to send to the ApplicationsManager 
        SubmitApplicationRequest appRequest = 
            Records.newRecord(SubmitApplicationRequest.class);
        appRequest.setApplicationSubmissionContext(appContext);
    
        // Submit the application to the ApplicationsManager
        // Ignore the response as either a valid response object is returned on 
        // success or an exception thrown to denote the failure
        applicationsManager.submitApplication(appRequest);
  • At this point, the ResourceManager will have accepted the application and in the background, will go through the process of allocating a container with the required specifications and then eventually setting up and launching the ApplicationMaster on the allocated container.
  • There are multiple ways a client can track progress of the actual task.
    • It can communicate with the ResourceManager and request for a report of the application via ClientRMProtocol#getApplicationReport.
            GetApplicationReportRequest reportRequest = 
                Records.newRecord(GetApplicationReportRequest.class);
            reportRequest.setApplicationId(appId);
            GetApplicationReportResponse reportResponse = 
                applicationsManager.getApplicationReport(reportRequest);
            ApplicationReport report = reportResponse.getApplicationReport();

      The ApplicationReport received from the ResourceManager consists of the following:

      • General application information: ApplicationId, queue to which the application was submitted, user who submitted the application and the start time for the application.
      • ApplicationMaster details: the host on which the ApplicationMaster is running, the rpc port (if any) on which it is listening for requests from clients and a token that the client needs to communicate with the ApplicationMaster.
      • Application tracking information: If the application supports some form of progress tracking, it can set a tracking url which is available via ApplicationReport#getTrackingUrl that a client can look at to monitor progress.
      • ApplicationStatus: The state of the application as seen by the ResourceManager is available via ApplicationReport#getYarnApplicationState. If the YarnApplicationState is set to FINISHED, the client should refer to ApplicationReport#getFinalApplicationStatus to check for the actual success/failure of the application task itself. In case of failures, ApplicationReport#getDiagnostics may be useful to shed some more light on the the failure.
    • If the ApplicationMaster supports it, a client can directly query the ApplicationMaster itself for progress updates via the host:rpcport information obtained from the ApplicationReport. It can also use the tracking url obtained from the report if available.
  • In certain situations, if the application is taking too long or due to other factors, the client may wish to kill the application. The ClientRMProtocol supports the forceKillApplication call that allows a client to send a kill signal to the ApplicationMaster via the ResourceManager. An ApplicationMaster if so designed may also support an abort call via its rpc layer that a client may be able to leverage.
        KillApplicationRequest killRequest = 
            Records.newRecord(KillApplicationRequest.class);                
        killRequest.setApplicationId(appId);
        applicationsManager.forceKillApplication(killRequest);

Writing an ApplicationMaster

  • The ApplicationMaster is the actual owner of the job. It will be launched by the ResourceManager and via the client will be provided all the necessary information and resources about the job that it has been tasked with to oversee and complete.
  • As the ApplicationMaster is launched within a container that may (likely will) be sharing a physical host with other containers, given the multi-tenancy nature, amongst other issues, it cannot make any assumptions of things like pre-configured ports that it can listen on.
  • When the ApplicationMaster starts up, several parameters are made available to it via the environment. These include the ContainerId for the ApplicationMaster container, the application submission time and details about the NodeManager host running the Application Master. Ref ApplicationConstants for parameter names.
  • All interactions with the ResourceManager require an ApplicationAttemptId (there can be multiple attempts per application in case of failures). The ApplicationAttemptId can be obtained from the ApplicationMaster containerId. There are helper apis to convert the value obtained from the environment into objects.
        Map<String, String> envs = System.getenv();
        String containerIdString = 
            envs.get(ApplicationConstants.AM_CONTAINER_ID_ENV);
        if (containerIdString == null) {
          // container id should always be set in the env by the framework 
          throw new IllegalArgumentException(
              "ContainerId not set in the environment");
        }
        ContainerId containerId = ConverterUtils.toContainerId(containerIdString);
        ApplicationAttemptId appAttemptID = containerId.getApplicationAttemptId();
  • After an ApplicationMaster has initialized itself completely, it needs to register with the ResourceManager via AMRMProtocol#registerApplicationMaster. The ApplicationMaster always communicate via the Scheduler interface of the ResourceManager.
        // Connect to the Scheduler of the ResourceManager. 
        YarnConfiguration yarnConf = new YarnConfiguration(conf);
        InetSocketAddress rmAddress = 
            NetUtils.createSocketAddr(yarnConf.get(
                YarnConfiguration.RM_SCHEDULER_ADDRESS,
                YarnConfiguration.DEFAULT_RM_SCHEDULER_ADDRESS));           
        LOG.info("Connecting to ResourceManager at " + rmAddress);
        AMRMProtocol resourceManager = 
            (AMRMProtocol) rpc.getProxy(AMRMProtocol.class, rmAddress, conf);
    
        // Register the AM with the RM
        // Set the required info into the registration request: 
        // ApplicationAttemptId, 
        // host on which the app master is running
        // rpc port on which the app master accepts requests from the client 
        // tracking url for the client to track app master progress
        RegisterApplicationMasterRequest appMasterRequest = 
            Records.newRecord(RegisterApplicationMasterRequest.class);
        appMasterRequest.setApplicationAttemptId(appAttemptID);     
        appMasterRequest.setHost(appMasterHostname);
        appMasterRequest.setRpcPort(appMasterRpcPort);
        appMasterRequest.setTrackingUrl(appMasterTrackingUrl);
    
        // The registration response is useful as it provides information about the 
        // cluster. 
        // Similar to the GetNewApplicationResponse in the client, it provides 
        // information about the min/mx resource capabilities of the cluster that 
        // would be needed by the ApplicationMaster when requesting for containers.
        RegisterApplicationMasterResponse response = 
            resourceManager.registerApplicationMaster(appMasterRequest);
  • The ApplicationMaster has to emit heartbeats to the ResourceManager to keep it informed that the ApplicationMaster is alive and still running. The timeout expiry interval at the ResourceManager is defined by a config setting accessible via YarnConfiguration.RM_AM_EXPIRY_INTERVAL_MS with the default being defined by YarnConfiguration.DEFAULT_RM_AM_EXPIRY_INTERVAL_MS. The AMRMProtocol#allocate calls to the ResourceManager count as heartbeats as it also supports sending progress update information. Therefore, an allocate call with no containers requested and progress information updated if any is a valid way for making heartbeat calls to the ResourceManager.
  • Based on the task requirements, the ApplicationMaster can ask for a set of containers to run its tasks on. The ApplicationMaster has to use the ResourceRequest class to define the following container specifications:
    • Hostname: If containers are required to be hosted on a particular rack or a specific host. ‘*’ is a special value that implies any host will do.
    • Resource capability: Currently, YARN only supports memory based resource requirements so the request should define how much memory is needed. The value is defined in MB and has to less than the max capability of the cluster and an exact multiple of the min capability. Memory resources correspond to physical memory limits imposed on the task containers.
    • Priority: When asking for sets of containers, an ApplicationMaster may define different priorities to each set. For example, the Map-Reduce ApplicationMaster may assign a higher priority to containers needed for the Map tasks and a lower priority for the Reduce tasks’ containers.
        // Resource Request
        ResourceRequest rsrcRequest = Records.newRecord(ResourceRequest.class);
    
        // setup requirements for hosts 
        // whether a particular rack/host is needed 
        // useful for applications that are sensitive
        // to data locality 
        rsrcRequest.setHostName("*");
    
        // set the priority for the request
        Priority pri = Records.newRecord(Priority.class);
        pri.setPriority(requestPriority);
        rsrcRequest.setPriority(pri);           
    
        // Set up resource type requirements
        // For now, only memory is supported so we set memory requirements
        Resource capability = Records.newRecord(Resource.class);
        capability.setMemory(containerMemory);
        rsrcRequest.setCapability(capability);
    
        // set no. of containers needed
        // matching the specifications
        rsrcRequest.setNumContainers(numContainers);
  • After defining the container requirements, the ApplicationMaster has to construct an AllocateRequest to send to the ResourceManager. The AllocateRequest consists of:
    • Requested containers: The container specifications and the no. of containers being requested for by the ApplicationMaster from the ResourceManager.
    • Released containers: There may be situations when the ApplicationMaster may have requested for more containers that it needs or due to failure issues, decide to use other containers allocated to it. In all such situations, it is beneficial to the cluster if the ApplicationMaster releases these containers back to the ResourceManager so that they can be re-allocated to other applications.
    • ResponseId: The response id that will be sent back in the response from the allocate call.
    • Progress update information: The ApplicationMaster can send its progress update (range between to 0 to 1) to the ResourceManager.
        List<ResourceRequest> requestedContainers;
        List<ContainerId> releasedContainers    
        AllocateRequest req = Records.newRecord(AllocateRequest.class);
    
        // The response id set in the request will be sent back in 
        // the response so that the ApplicationMaster can 
        // match it to its original ask and act appropriately.
        req.setResponseId(rmRequestID);
    
        // Set ApplicationAttemptId 
        req.setApplicationAttemptId(appAttemptID);
    
        // Add the list of containers being asked for 
        req.addAllAsks(requestedContainers);
    
        // If the ApplicationMaster has no need for certain 
        // containers due to over-allocation or for any other
        // reason, it can release them back to the ResourceManager
        req.addAllReleases(releasedContainers);
    
        // Assuming the ApplicationMaster can track its progress
        req.setProgress(currentProgress);
    
        AllocateResponse allocateResponse = resourceManager.allocate(req);
  • The AllocateResponse sent back from the ResourceManager provides the following information via the AMResponse object:
    • Reboot flag: For scenarios when the ApplicationMaster may get out of sync with the ResourceManager.
    • Allocated containers: The containers that have been allocated to the ApplicationMaster.
    • Headroom: Headroom for resources in the cluster. Based on this information and knowing its needs, an ApplicationMaster can make intelligent decisions such as re-prioritizing sub-tasks to take advantage of currently allocated containers, bailing out faster if resources are not becoming available etc.
    • Completed containers: Once an ApplicationMaster triggers a launch an allocated container, it will receive an update from the ResourceManager when the container completes. The ApplicationMaster can look into the status of the completed container and take appropriate actions such as re-trying a particular sub-task in case of a failure.

    One thing to note is that containers will not be immediately allocated to the ApplicationMaster. This does not imply that the ApplicationMaster should keep on asking the pending count of required containers. Once an allocate request has been sent, the ApplicationMaster will eventually be allocated the containers based on cluster capacity, priorities and the scheduling policy in place. The ApplicationMaster should only request for containers again if and only if its original estimate changed and it needs additional containers.

        // Get AMResponse from AllocateResponse 
        AMResponse amResp = allocateResponse.getAMResponse();                       
    
        // Retrieve list of allocated containers from the response 
        // and on each allocated container, lets assume we are launching 
        // the same job.
        List<Container> allocatedContainers = amResp.getAllocatedContainers();
        for (Container allocatedContainer : allocatedContainers) {
          LOG.info("Launching shell command on a new container."
              + ", containerId=" + allocatedContainer.getId()
              + ", containerNode=" + allocatedContainer.getNodeId().getHost() 
              + ":" + allocatedContainer.getNodeId().getPort()
              + ", containerNodeURI=" + allocatedContainer.getNodeHttpAddress()
              + ", containerState" + allocatedContainer.getState()
              + ", containerResourceMemory"  
              + allocatedContainer.getResource().getMemory());
    
          // Launch and start the container on a separate thread to keep the main 
          // thread unblocked as all containers may not be allocated at one go.
          LaunchContainerRunnable runnableLaunchContainer = 
              new LaunchContainerRunnable(allocatedContainer);
          Thread launchThread = new Thread(runnableLaunchContainer);        
          launchThreads.add(launchThread);
          launchThread.start();
        }
    
        // Check what the current available resources in the cluster are
        Resource availableResources = amResp.getAvailableResources();
        // Based on this information, an ApplicationMaster can make appropriate 
        // decisions
    
        // Check the completed containers
        // Let's assume we are keeping a count of total completed containers, 
        // containers that failed and ones that completed successfully.                     
        List<ContainerStatus> completedContainers = 
            amResp.getCompletedContainersStatuses();
        for (ContainerStatus containerStatus : completedContainers) {                               
          LOG.info("Got container status for containerID= " 
              + containerStatus.getContainerId()
              + ", state=" + containerStatus.getState()     
              + ", exitStatus=" + containerStatus.getExitStatus() 
              + ", diagnostics=" + containerStatus.getDiagnostics());
    
          int exitStatus = containerStatus.getExitStatus();
          if (0 != exitStatus) {
            // container failed 
            // -100 is a special case where the container 
            // was aborted/pre-empted for some reason 
            if (-100 != exitStatus) {
              // application job on container returned a non-zero exit code
              // counts as completed 
              numCompletedContainers.incrementAndGet();
              numFailedContainers.incrementAndGet();                                                        
            }
            else { 
              // something else bad happened 
              // app job did not complete for some reason 
              // we should re-try as the container was lost for some reason
              // decrementing the requested count so that we ask for an
              // additional one in the next allocate call.          
              numRequestedContainers.decrementAndGet();
              // we do not need to release the container as that has already 
              // been done by the ResourceManager/NodeManager. 
            }
            }
            else { 
              // nothing to do 
              // container completed successfully 
              numCompletedContainers.incrementAndGet();
              numSuccessfulContainers.incrementAndGet();
            }
          }
        }
  • After a container has been allocated to the ApplicationMaster, it needs to follow a similar process that the Client followed in setting up the ContainerLaunchContext for the eventual task that is going to be running on the allocated Container. Once the ContainerLaunchContext is defined, the ApplicationMaster can then communicate with the ContainerManager to start its allocated container.
           
        //Assuming an allocated Container obtained from AMResponse 
        Container container;   
        // Connect to ContainerManager on the allocated container 
        String cmIpPortStr = container.getNodeId().getHost() + ":" 
            + container.getNodeId().getPort();              
        InetSocketAddress cmAddress = NetUtils.createSocketAddr(cmIpPortStr);               
        ContainerManager cm = 
            (ContainerManager)rpc.getProxy(ContainerManager.class, cmAddress, conf);     
    
        // Now we setup a ContainerLaunchContext  
        ContainerLaunchContext ctx = 
            Records.newRecord(ContainerLaunchContext.class);
    
        ctx.setContainerId(container.getId());
        ctx.setResource(container.getResource());
    
        try {
          ctx.setUser(UserGroupInformation.getCurrentUser().getShortUserName());
        } catch (IOException e) {
          LOG.info(
              "Getting current user failed when trying to launch the container",
              + e.getMessage());
        }
    
        // Set the environment 
        Map<String, String> unixEnv;
        // Setup the required env. 
        // Please note that the launched container does not inherit 
        // the environment of the ApplicationMaster so all the 
        // necessary environment settings will need to be re-setup 
        // for this allocated container.      
        ctx.setEnvironment(unixEnv);
    
        // Set the local resources 
        Map<String, LocalResource> localResources = 
            new HashMap<String, LocalResource>();
        // Again, the local resources from the ApplicationMaster is not copied over 
        // by default to the allocated container. Thus, it is the responsibility 
              // of the ApplicationMaster to setup all the necessary local resources 
              // needed by the job that will be executed on the allocated container. 
    
        // Assume that we are executing a shell script on the allocated container 
        // and the shell script's location in the filesystem is known to us. 
        Path shellScriptPath; 
        LocalResource shellRsrc = Records.newRecord(LocalResource.class);
        shellRsrc.setType(LocalResourceType.FILE);
        shellRsrc.setVisibility(LocalResourceVisibility.APPLICATION);          
        shellRsrc.setResource(
            ConverterUtils.getYarnUrlFromURI(new URI(shellScriptPath)));
        shellRsrc.setTimestamp(shellScriptPathTimestamp);
        shellRsrc.setSize(shellScriptPathLen);
        localResources.put("MyExecShell.sh", shellRsrc);
    
        ctx.setLocalResources(localResources);                      
    
        // Set the necessary command to execute on the allocated container 
        String command = "/bin/sh ./MyExecShell.sh"
            + " 1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout"
            + " 2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr";
    
        List<String> commands = new ArrayList<String>();
        commands.add(command);
        ctx.setCommands(commands);
    
        // Send the start request to the ContainerManager
        StartContainerRequest startReq = Records.newRecord(StartContainerRequest.class);
        startReq.setContainerLaunchContext(ctx);
        cm.startContainer(startReq);
  • The ApplicationMaster, as mentioned previously, will get updates of completed containers as part of the response from the AMRMProtocol#allocate calls. It can also monitor its launched containers pro-actively by querying the ContainerManager for the status.
        GetContainerStatusRequest statusReq = 
            Records.newRecord(GetContainerStatusRequest.class);
        statusReq.setContainerId(container.getId());
        GetContainerStatusResponse statusResp = cm.getContainerStatus(statusReq);
        LOG.info("Container Status"
            + ", id=" + container.getId()
            + ", status=" + statusResp.getStatus());

FAQ

How can I distribute my application’s jars to all of the nodes in the YARN cluster that need it?

You can use the LocalResource to add resources to your application request. This will cause YARN to distribute the resource to the ApplicationMaster node. If the resource is a tgz, zip, or jar – you can have YARN unzip it. Then, all you need to do is add the unzipped folder to your classpath. For example, when creating your application request:

    File packageFile = new File(packagePath);
    Url packageUrl = ConverterUtils.getYarnUrlFromPath(
        FileContext.getFileContext.makeQualified(new Path(packagePath)));

    packageResource.setResource(packageUrl);
    packageResource.setSize(packageFile.length());
    packageResource.setTimestamp(packageFile.lastModified());
    packageResource.setType(LocalResourceType.ARCHIVE);
    packageResource.setVisibility(LocalResourceVisibility.APPLICATION);

    resource.setMemory(memory)
    containerCtx.setResource(resource)
    containerCtx.setCommands(ImmutableList.of(
        "java -cp './package/*' some.class.to.Run "
        + "1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout "
        + "2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr"))
    containerCtx.setLocalResources(
        Collections.singletonMap("package", packageResource))
    appCtx.setApplicationId(appId)
    appCtx.setUser(user.getShortUserName)
    appCtx.setAMContainerSpec(containerCtx)
    request.setApplicationSubmissionContext(appCtx)
    applicationsManager.submitApplication(request)

As you can see, the setLocalResources command takes a map of names to resources. The name becomes a sym link in your application’s cwd, so you can just refer to the artifacts inside by using ./package/*.

Note: Java’s classpath (cp) argument is VERY sensitive. Make sure you get the syntax EXACTLY correct.

Once your package is distributed to your ApplicationMaster, you’ll need to follow the same process whenever your ApplicationMaster starts a new container (assuming you want the resources to be sent to your container). The code for this is the same. You just need to make sure that you give your ApplicationMaster the package path (either HDFS, or local), so that it can send the resource URL along with the container ctx.

How do I get the ApplicationMaster’s ApplicationAttemptId?

The ApplicationAttemptId will be passed to the ApplicationMaster via the environment and the value from the environment can be converted into an ApplicationAttemptId object via the ConverterUtils helper function.

My container is being killed by the Node Manager

This is likely due to high memory usage exceeding your requested container memory size. There are a number of reasons that can cause this. First, look at the process tree that the node manager dumps when it kills your container. The two things you’re interested in are physical memory and virtual memory. If you have exceeded physical memory limits your app is using too much physical memory. If you’re running a Java app, you can use -hprof to look at what is taking up space in the heap. If you have exceeded virtual memory, you may need to increase the value of the the cluster-wide configuration variable yarn.nodemanager.vmem-pmem-ratio.

How do I include native libraries?

Setting -Djava.library.path on the command line while launching a container can cause native libraries used by Hadoop to not be loaded correctly and can result in errors. It is cleaner to use LD_LIBRARY_PATH instead.

[repost ]iOS applications reverse engineering

original:http://media.hacking-lab.com/scs3/scs3_pdf/SCS3_2011_Bachmann.pdf

iOS applications reverse engineering
Julien Bachmann – julien@scrt.ch
12
Agenda
› Motivations
› The architecture
› Mach-O
› Objective-C
› ARM
› AppStore binaries
› Find’em
› Decrypt’em
› Reverse’em
› What to look for
› Where to start
› Remote connections
› Data protection
› Conclusion3
Preamble
● Security engineer @ SCRT
● Areas of interest focused on reverse engineering,
software vulnerabilities and OS internals
● Not an Apple fanboy but like all the cool kids… 😉
● Goals of this presentation is to give a state of the art, in
45minutes, of my knowledge about iOS applications
reverse engineering
● Motivate people to do more research in user/kernel-land
iOS reverse engineering4
Motivations5
A few numbers
› +160 millions iOS users
› +400 000 applications available
› +10 billion downloads
→ (modestly) large user base6
e-banking applications7
Applications review
› Apple defined a review process
› 10% of the applications are classified as dangereous
› Cases of applications not « compliant » with
their description8
Storm8 case9
Now, what if you want to…
› check an external app ?
› verify that your application is secure ?
› check what kind of information an attacker
can get from your application ?10
Best reason ever…
› Because it’s fun to learn how to reverse new
things !11
The architecture12
Mach-O
› File format for
› Executables
› Libraries
› Core dumps13
Mach-O
› Contains three parts
› Header
› Load commands
› Data14
Mach-O
› Header15
Mach-O16
Mach-O
› Load commands
› Indicates memory layout
› Locates symbols table
› Main thread context
› Shared libraries17
Mach-O
› Data
› Segments containing sections
› __PAGEZERO
› __TEXT
› Executable code and r–
› __DATA
› rw-
› __OBJC
› …18
Mach-O
› objdump ?
› Forget about it
› Introducing : otool !19
Mach-O
› Universal / FAT files
› Supports multiples architectures
› For OSX
› Universal
› PowerPC, x86 and x86_64
› For iOS
› FAT
› armv6, armv720
Objective-C
› Programming language
› Superset of the C language
› Object oriented
› Class method calls differ from C++21
Calling methods
› C++
› ObjectPointer->Method(param1, param2)
› Obj-C
› [ObjectPointer Method:param1 param2Name:param2]22
Looking more closely
› [ObjectPointer Method]
› objc_msgSend(ObjectPointer, @selector(Method))
› Selector
› C string
› objc_msgSend(ObjectPointer, “Method”)23
ARM
› RISC
› load-store architecture
› Fixed-length 32-bit instructions
› 3-address instruction formats24
Registers
› User-level programs
› 15 general-purpose 32-bit registers : r0 → r14
› PC = r15
› Current program status register (N, Z, C, V flags, etc.)25
Load-store architecture
› Instructions can be classified into 3 groups
› Data transfer (load-store)
› Data processing
› Control flow26
Data transfer instructions
› Load from memory
› LDR r0, [r1] → r0 = mem[r1]
› Store to memory
› STR r0, [r1] → mem[r1] = r027
Data processing instructions
› Simple
› ADD r0, r1, r2 → r0 = r1 + r2
› Immediate operands
› ADD r1, r1, #1 → r1 = r1 + 1
› Shifted register operands
› ADD r3, r2, r1, LSL #3 → r3 = r2 + (r1 << 3)28
Control flow instructions
› Branch instructions
› B LABEL
› BAL LABEL
› Conditional branches
› BXX LABEL
› BEQ, BNE, BPL, BMI, …
› Conditional execution
› CMP r0, #5 → if (r0!= 5)
› ADDNE r1, r1, r0 r1 = r1 + r029
Control flow instructions
› Branch and link instructions
› BL SUBROUTINE → r14 = @next instr + jmp SUBR
› PUSH {r0-r5, LR}
› …
› POP {r0-r5, PC}30
Calling convention
› Arguments values
› r0 → r3
› Local variables
› r4 → r11
› Return value
› r031
Summing it up
› Objective-C
› [ObjectPointer Method:42]
› C++
› ObjectPointer->Method(42)
› Pseudo C
› objc_msgSend(ObjectPointer, “Method”, 42)
› ARM assembly
›32
AppStore binaries33
First of all
› Forget about the simulator
› Binaries compiled for x86 not ARM
› Need to use a jailbroken iOS device
› Tools to install
› SSH
› GDB
› … 34
Find’em
› Downloaded from the AppStore as .ipa
› ZIP file
› ~/Music/iTunes/iTunes Music/Mobile Applications/
› On iOS devices
› /var/mobile/Applications/<UUID>/<AppName>.app/35
Content of <AppName>.app*
*after download from the device to workstation. Owner set to mobile:mobile on iOS36
FAT binaries
› Binary might contain multiple versions
› Need to extract the one corresponding to our device37
Decrypt’em
› Encrypted using “FairPlay like” method
› Each executable page is encrypted with AES and a MD5
checksum is computed
› How to know if a binary is encrypted ?
› LC_ENCRYPTION_INFO
› cryptid → 1 if the binary is encrypted
› cryptoffset → offset of the encrypted data
› cryptsize → size of the encrypted data38
LC_ENCRYPTION_INFO39
Unpack the binary
› Use a script that automates the process
› crackulous
› Not leet enough;)
› “unpack your app in 5 steps and achieve
peace”
› Launch GDB
› Set a breakpoint
› Run the application
› Extract the unencrypted executable code
› Patch the architecture specific binary40
Where do I set the breakpoint ?
› Execution steps
› FAT binary is run
› Architecture specific binary is mapped in memory
› Executable code is decrypted
› Branch to start symbol
› Get start’s address41
GDB, set, run42
« Breakpoint reached capt’ain »43
Extract the executable code
› Useful information
› start
› cryptsize
›44
Patch the architecture specific binary
› Locate LC_ENCRYPTION_INFO
› Mach-O header parser
› Hexadecimal editor
› Replace cryptid
› 1 → 0
› Replace encrypted code with unpacked one45
Locate LC_ENCRYPTION_INFO
› Mach-O header parser
› Search for the load command in the binary46
Locate LC_ENCRYPTION_INFO47
Modified LC_ENCRYPTION_INFO48
Replace encrypted code49
Reverse’em
› Retrieve classes declarations
› class-dump
› Resolve objc_msgSend calls
› Useless call graph
› Need to patch the disassembly50
class-dump51
First look at the disassembly52
objc_msgSend
› As stated before
› objc_msgSend(<ref to object>, @selector(method), …)
› ARM calling convention
› arg1 → r0
› arg2 → r1
› Backtrace calls to objc_msgSend
› By hand
› Using Zynamics IDAPython scripts53
objc_helper.py54
What to look for55
Where to start
› Locate the main class
› UIApplicationDelegate
› applicationDidFinishLaunching
› ApplicationDidFinishLaunchingWithOptions
› Views
› UI*ViewController
› viewDidLoad56
applicationDidFinishLaunching57
Remote connections
› HTTP(S)
› NSURL
› …
› Sockets
› CFSocketCreate
› …58
Data protection
› Accessing the KeyChain using JB tools
› Lost iPhone ? Lost Passwords ! *
› Protect KeyChain content
› Using passcode
› setAttributes ofItemAtPath → NSFileProtectionComplete
› SecItemAdd → kSecAttrAccessibleWhenUnlocked
* http://www.sit.fraunhofer.de/forschungsbereiche/projekte/Lost_iPhone.jsp59
Data protection60
Conclusion61
Conclusion
› This is a revolution !
› This presentation was only an introduction
› Lot of work/ideas around iOS
› Grab your debugger and disassembler and work on it
› I’m open to discuss it around a few beers

[repost ] Ajax for Java developers: Write scalable Comet applications with Jetty and Direct Web Remoting (Create event-driven Web applications using Continuations and Reverse Ajax)

Summary: Ajax applications driven by asynchronous server-side events can be tricky to implement and difficult to scale. Returning to his popular series, Philip McCarthy shows an effective approach: The Comet pattern allows you to push data to clients, and Jetty 6’s Continuations API lets your Comet application scale to a large number of clients. You can conveniently take advantage of both Comet and Continuations with the Reverse Ajax technology in Direct Web Remoting 2.

View more content in this series

With Ajax firmly established as a widespread Web application-development technique, several common Ajax usage patterns have emerged. For instance, Ajax is often used in response to user input to modify parts of a page with new data fetched from the server. Sometimes, though, a Web application’s user interface needs to update in response to server-side events that occur asynchronously, without user action — for instance, to show new messages arriving in an Ajax chat application or to display changes from another user in a collaborative text editor. Because HTTP connections between a Web browser and a server can be established only by the browser, the server can’t “push” changes to the browser as they occur.

Ajax applications can use two fundamental approaches to work around this problem: either the browser can poll the server for updates every few seconds, or the server can hold open a connection from the browser and pass data as it becomes available. This long-lived connection technique has become known as Comet (see Resources). This article shows how you can use the Jetty servlet engine and DWR together to implement a Comet Web application simply and efficiently.

Why Comet?

The main drawback of the polling approach is the amount of traffic generated as it scales to many clients. Each client must regularly hit the server to check for updates, which places a burden on the server’s resources. The worst situation is an application involving infrequent updates, such as an Ajax mail Inbox. In this case, the vast majority of client polls would be redundant, with the server simply answering “no data yet.” The server load can be alleviated by increasing the polling interval, but this has the undesirable consequence of introducing a lag between a server event and the client’s awareness of it. Of course, a reasonable balance can be found for many applications, and polling works acceptably well.

Nevertheless, one of the appeals of the Comet strategy is its perceived efficiency. Clients don’t produce the noisy traffic characteristic of polling, and as soon as events occur, they can be published to the client. But holding open long-lived connections consumes server resources too. While a servlet holds a persistent request in a waiting state, that servlet is monopolizing a thread. This limits Comet’s scalability with a traditional servlet engine because the number of clients can quickly overwhelm the number of threads that the server stack can handle efficiently.

Back to top

How Jetty 6 differs

Jetty 6 is designed to scale to large numbers of simultaneous connections, exploiting the Java™ language’s nonblocking I/O (java.nio) libraries and using an optimised output-buffer architecture (see Resources). Jetty also has a trick up its sleeve for dealing with long-lived connections: a feature known as Continuations. I’ll demonstrate Continuations with a simple servlet that receives a request, waits for two seconds, and then sends a response. Next, I’ll show what happens when the server has more clients than it has threads to handle them. Finally, I’ll reimplement the servlet using Continuations, and you’ll see the difference that they make.

To make it easier to follow what’s happening in the following examples, I’ll restrict the Jetty servlet engine to a single request-handling thread. Listing 1 shows the relevant configuration in jetty.xml. I actually need to allow a total of three threads in theThreadPool: the Jetty server itself uses one, and another runs an HTTP connector, listening for incoming requests. This leaves one thread to execute servlet code.
Listing 1. Jetty configuration for a single servlet thread

                
<?xml version="1.0"?>
<!DOCTYPE Configure PUBLIC "-//Mort Bay Consulting//DTD Configure//EN"
  "http://jetty.mortbay.org/configure.dtd">
<Configure id="Server">
    <Set name="ThreadPool">
      <New>
        <Set name="minThreads">3</Set>
        <Set name="lowThreads">0</Set>
        <Set name="maxThreads">3</Set>
      </New>
    </Set>
</Configure>

 

Next, to simulate waiting for an asynchronous event, Listing 2 shows the service() method for BlockingServlet, which simply uses a Thread.sleep() call to pause for 2,000 milliseconds before completing. It also outputs the system time at the beginning and end of execution. To help disambiguate output from different requests, it also logs a request parameter used as an identifier.
Listing 2. BlockingServlet

                
public class BlockingServlet extends HttpServlet {

  public void service(HttpServletRequest req, HttpServletResponse res)
                                              throws java.io.IOException {

    String reqId = req.getParameter("id");

    res.setContentType("text/plain");
    res.getWriter().println("Request: "+reqId+"\tstart:\t" + new Date());
    res.getWriter().flush();

    try {
      Thread.sleep(2000);
    } catch (Exception e) {}

    res.getWriter().println("Request: "+reqId+"\tend:\t" + new Date());
  }
}

 

Now you can observe the servlet’s behaviour in response to several simultaneous requests. Listing 3 shows the console output of five parallel requests using lynx. The command line simply launches five lynx processes, appending an identifying ordinal to the request URL.
Listing 3. Output from several concurrent requests to BlockingServlet

                
$ for i in 'seq 1 5'  ; do lynx -dump localhost:8080/blocking?id=$i &  done
Request: 1      start:  Sun Jul 01 12:32:29 BST 2007
Request: 1      end:    Sun Jul 01 12:32:31 BST 2007

Request: 2      start:  Sun Jul 01 12:32:31 BST 2007
Request: 2      end:    Sun Jul 01 12:32:33 BST 2007

Request: 3      start:  Sun Jul 01 12:32:33 BST 2007
Request: 3      end:    Sun Jul 01 12:32:35 BST 2007

Request: 4      start:  Sun Jul 01 12:32:35 BST 2007
Request: 4      end:    Sun Jul 01 12:32:37 BST 2007

Request: 5      start:  Sun Jul 01 12:32:37 BST 2007
Request: 5      end:    Sun Jul 01 12:32:39 BST 2007

 

The output in Listing 3 is no surprise. Because only one thread is available for Jetty to execute the servlet’s service() method, Jetty queues each request and services it serially. The time stamps show that immediately after a response is dispatched for one request (an end message), the servlet begins working on the next request (the subsequent start message). So even though all five requests were sent simultaneously, one request must wait eight seconds before the servlet can handle it.

Remember that no useful work is being performed while the servlet blocks. This code simulates a situation where the request is waiting for an event to arrive asynchronously from another part of the application. The server is neither CPU- nor I/O-bound here: requests are queued only as a result of thread-pool exhaustion.

Now, check out how the Continuations feature in Jetty 6 can be helpful in this kind of situation. Listing 4 shows theBlockingServlet from Listing 2 rewritten using the Continuations API. I’ll explain the code a little later.
Listing 4. ContinuationServlet

                
public class ContinuationServlet extends HttpServlet {

  public void service(HttpServletRequest req, HttpServletResponse res)
                                              throws java.io.IOException {

    String reqId = req.getParameter("id");

    Continuation cc = ContinuationSupport.getContinuation(req,null);

    res.setContentType("text/plain");
    res.getWriter().println("Request: "+reqId+"\tstart:\t"+new Date());
    res.getWriter().flush();

    cc.suspend(2000);

    res.getWriter().println("Request: "+reqId+"\tend:\t"+new Date());
  }
}

 

Listing 5 shows the output from five simultaneous requests to ContinuationServlet; compare with Listing 3.
Listing 5. Output from several concurrent requests to ContinuationServlet

                
$ for i in 'seq 1 5'  ; do lynx -dump localhost:8080/continuation?id=$i &  done

Request: 1      start:  Sun Jul 01 13:37:37 BST 2007
Request: 1      start:  Sun Jul 01 13:37:39 BST 2007
Request: 1      end:    Sun Jul 01 13:37:39 BST 2007

Request: 3      start:  Sun Jul 01 13:37:37 BST 2007
Request: 3      start:  Sun Jul 01 13:37:39 BST 2007
Request: 3      end:    Sun Jul 01 13:37:39 BST 2007

Request: 2      start:  Sun Jul 01 13:37:37 BST 2007
Request: 2      start:  Sun Jul 01 13:37:39 BST 2007
Request: 2      end:    Sun Jul 01 13:37:39 BST 2007

Request: 5      start:  Sun Jul 01 13:37:37 BST 2007
Request: 5      start:  Sun Jul 01 13:37:39 BST 2007
Request: 5      end:    Sun Jul 01 13:37:39 BST 2007

Request: 4      start:  Sun Jul 01 13:37:37 BST 2007
Request: 4      start:  Sun Jul 01 13:37:39 BST 2007
Request: 4      end:    Sun Jul 01 13:37:39 BST 2007

 

There are two important things to note in Listing 5. First, each start message appears twice; don’t worry about this for the moment. Second, and more important, the requests are now handled concurrently, without queueing. Note that the time stamps of all the start and end messages are the same, at least at this resolution. Consequently, no request takes longer than two seconds to complete, even though only a single servlet thread is running.

Back to top

Inside Jetty’s Continuations mechanism

An understanding of how Jetty’s Continuations mechanism is implemented will explain the effects you see in Listing 5. To use Continuations, Jetty must be configured to handle requests with its SelectChannelConnector. This connector is built on thejava.nio APIs, allowing it to hold connections open without consuming a thread for each. When theSelectChannelConnector is used, ContinuationSupport.getContinuation() provides an instance ofSelectChannelConnector.RetryContinuation. (However, you should code against the Continuation interface only; seePortability and the Continuations API.) When suspend() is called on RetryContinuation, it throws a special runtime exception — RetryRequest — which propagates out of the servlet and back through the filter chain and is caught inSelectChannelConnector. But instead of sending any response to the client as a result of the exception, the request is held in a queue of pending Continuations, and the HTTP connection is kept open. At this point, the thread that was used to service the request is returned to the ThreadPool, where it can be used to service another request.

Portability and the Continuations API

I mentioned you should use Jetty’sSelectChannelConnector to enable Continuations functionality. However, the Continuations API is still valid with a traditional SocketConnector, in which case Jetty falls back to a different Continuationimplementation that uses wait()/notify()behaviour. Your code will still compile and run, but without the benefits of nonblocking Continuations. If you want to keep the option of using a non-Jetty server, you could consider writing your own Continuationwrapper that uses reflection to check for the availability of Jetty’s Continuations library at run time. DWR uses this strategy.

The suspended request remains in the pending Continuations queue until either the specified timeout expires, or the resume()method is called on its Continuation (more on this later). When either of these conditions occurs, the request is resubmitted to the servlet (via the filter chain). In effect, the entire request is “replayed” up until the point where suspend() was first called. When execution reaches the suspend() call the second time, the RetryRequestexception is not thrown, and execution continues as normal.

The output in Listing 5 should make sense now. As each request, in turn, enters the servlet’s service() method, the start message is sent in response, and then the Continuation‘s suspend() method causes execution to leave the servlet, freeing up the thread to begin servicing the next request. All five requests quickly run through the first part of the service() method and enter the suspended state, and all of the start messages are output within milliseconds. Two seconds later, as the suspend() timeouts expire, the first request is retrieved from the pending queue and resubmitted to theContinuationServlet. The start message is output a second time, the second call to suspend() returns immediately, and the end message is sent in response. The servlet code then executes again for the next queued request, and so on.

So, in both the BlockingServlet and ContinuationServlet cases, requests are queued for access to the single servlet thread. However, while the two-second pause in BlockingServlet occurs inside the servlet’s thread of execution,ContinuationServlet‘s pause occurs outside of the servlet in SelectChannelConnector. The overall throughput ofContinuationServlet is higher because the servlet thread isn’t tied up most of the time in a sleep() call.

Back to top

Making Continuations useful

Now that you’ve seen that Continuations allow servlet requests to be suspended without thread consumption, I need to explain a little bit more of the Continuations API to show you how to use Continuations for practical purposes.

resume() method forms a pair with suspend(). You can think of them of as the Continuations equivalent of the standardObject wait()/notify() mechanism. That is, suspend() puts a Continuation (and therefore the execution of the current method) on hold until either its timeout expires or another thread calls resume(). The suspend()/resume() pair is key to implementing a real Comet-style service using Continuations. The basic pattern is to obtain the Continuation from the current request, call suspend(), and wait until your asynchronous event arrives. Then call resume() and generate a response.

However, unlike the true language-level continuations in languages such as Scheme, or indeed the Java language’swait()/notify() paradigm, calling resume() on a Jetty Continuation doesn’t mean that code execution picks up exactly where it left off. As you’ve seen, what actually happens is that the request associated with the Continuation is replayed. This results in two problems: undesirable reexecution of code as in ContinuationServlet in Listing 4, and loss of state: anything in scope when the call is made to suspend() is lost.

The solution to the first of these issues is the isPending() method. If the return value of isPending() is true, this means thatsuspend() has been called previously, and execution of the retried request has not yet reached suspend() for the second time. In other words, making code prior to your suspend() call conditional on isPending() ensures that it executes only once per request. It’s best to design your application code before the suspend() call to be idempotent, so that calling it twice won’t matter anyway, but where that isn’t possible you can use isPending()Continuation also offers a simple mechanism for preserving state: the putObject(Object) and getObject() methods. Use these to hold a context object with any state you need to preserve when the Continuation is suspended. You can also use this mechanism as a way of passing event data between threads, as you’ll see later on.

Back to top

Writing a Continuations-based application

As a vaguely real-world example scenario, I’m going to develop a basic GPS coordinate-tracking Web application. It will generate randomised latitude-longitude pairs at irregular intervals. With some imagination, the coordinates generated could be the positions of nearby public transport, marathon runners carrying GPS devices, cars in a rally, or the location of a package in transit. The interesting part is how I tell the browser about the coordinates. Figure 1 shows a class diagram for this simple GPS-tracker application:
Figure 1. Class diagram showing major components of the GPS tracker application
UML class diagram of GPS tracker components

First, the application needs something that generates coordinates. This is what RandomWalkGenerator does. Starting from an initial coordinate pair, each call to its private generateNextCoord() method takes a random constrained step away from that location and returns the new position as a GpsCoord object. When initialized, RandomWalkGenerator creates a thread that calls the generateNextCoord() method at randomized intervals and then sends the generated coordinate to any CoordListenerinstances that have registered themselves with addListener()Listing 6 shows the logic of RandomWalkGenerator‘s loop:
Listing 6. RandomWalkGenerator’s run() method

                
public void run() {

  try {
    while (true) {
      int sleepMillis = 5000 + (int)(Math.random()*8000d);
      Thread.sleep(sleepMillis);
      dispatchUpdate(generateNextCoord());
    }
  } catch (Exception e) {
    throw new RuntimeException(e);
  }
}

 

CoordListener is a callback interface that just defines the onCoord(GpsCoord coord) method. In this example, theContinuationBasedTracker class implements CoordListener. The other public method on ContinuationBasedTrackeris getNextPosition(Continuation, int)Listing 7 shows the implementation of these methods:
Listing 7. The innards of ContinuationBasedTracker

                
public GpsCoord getNextPosition(Continuation continuation, int timeoutSecs) {

  synchronized(this) {
    if (!continuation.isPending()) {
      pendingContinuations.add(continuation);
    }

    // Wait for next update
    continuation.suspend(timeoutSecs*1000);
  }

  return (GpsCoord)continuation.getObject();
}

public void onCoord(GpsCoord gpsCoord) {

  synchronized(this) {
    for (Continuation continuation : pendingContinuations) {

      continuation.setObject(gpsCoord);
      continuation.resume();
    }

    pendingContinuations.clear();
  }
}

 

When a client calls getNextPosition() with a Continuation, the isPending method checks that the request is not being retried at this point, then adds it to a collection of Continuations that are waiting for a coordinate. Then the Continuation is suspended. Meanwhile, onCoord — invoked when a new coordinate is generated — simply loops over any pendingContinuations, sets the GPS coordinate on them, and resumes them. Each retried request then completes execution ofgetNextPosition(), retrieving the GpsCoord from the Continuation and returning it to the caller. Note the need for synchronization here, both to guard against inconsistent state in the pendingContinuations collection and to ensure that a newly added Continuation isn’t resumed before it has been suspended.

The final piece of the puzzle is the servlet code itself, shown in Listing 8:
Listing 8. GPSTrackerServlet implementation

                
public class GpsTrackerServlet extends HttpServlet {

    private static final int TIMEOUT_SECS = 60;
    private ContinuationBasedTracker tracker = new ContinuationBasedTracker();

    public void service(HttpServletRequest req, HttpServletResponse res)
                                                throws java.io.IOException {

      Continuation c = ContinuationSupport.getContinuation(req,null);
      GpsCoord position = tracker.getNextPosition(c, TIMEOUT_SECS);

      String json = new Jsonifier().toJson(position);
      res.getWriter().print(json);
    }
}

 

As you can see, this servlet does very little. It simply obtains the request’s Continuation, calls getNextPosition(), converts the GPSCoord into JavaScript Object Notation (JSON), and writes it out. Nothing here needs protection from reexecution, so I don’t need to check isPending()Listing 9 shows the output of a call to the GpsTrackerServlet, again with five simultaneous requests but only a single available thread on the server:
Listing 9. Output of GPSTrackerServlet

                
$  for i in 'seq 1 5'  ; do lynx -dump localhost:8080/tracker &  done
   { coord : { lat : 51.51122, lng : -0.08103112 } }
   { coord : { lat : 51.51122, lng : -0.08103112 } }
   { coord : { lat : 51.51122, lng : -0.08103112 } }
   { coord : { lat : 51.51122, lng : -0.08103112 } }
   { coord : { lat : 51.51122, lng : -0.08103112 } }

 

This example is unspectacular but serves as a proof-of-concept. After the requests are dispatched, they are held open for several seconds until the coordinate is generated, at which point the responses are quickly generated. This is the basis of the Comet pattern, with Jetty handling five concurrent requests on one thread, thanks to Continuations.

Back to top

Creating a Comet client

Now that you’ve seen how Continuations can be used in principle to create nonblocking Web services, you might wonder how to create client-side code to exploit this ability. A Comet client needs to:

  1. Hold open an XMLHttpRequest connection until a response is received.
  2. Dispatch that response to the appropriate JavaScript handler.
  3. Immediately establish a new connection.

A more advanced Comet setup could use one connection to push data from several different services to the browser, with appropriate routing mechanisms on the client and server. One possibility would be to write client-side code against a JavaScript library such as Dojo, which provides Comet-based request mechanisms in the shape of dojo.io.cometd.

However, if you’re working with the Java language on the server, a great way to get advanced Comet support on both client and server is to use the DWR 2 (see Resources). If you’re not familiar with DWR, you can read Part 3 of this series, “Ajax with Direct Web Remoting.” DWR transparently provides an HTTP-RPC transport layer, exposing your Java objects to calls across the Web from JavaScript code. DWR generates client-side proxies, automatically marshalls and unmarshalls data, handles security concerns, provides a convenient client-side utility library, and works in all major browsers.

Back to top

DWR 2: Reverse Ajax

Newly introduced with DWR 2 is the concept of Reverse Ajax. This is a mechanism by which server-side events are “pushed” to the client. Client-side DWR code transparently deals with establishing connections and parsing responses, so from a developer’s point of view, events can simply be published to the client from server-side Java code.

DWR can be configured to use three different mechanisms for Reverse Ajax. One is the familiar polling approach. The second, known as piggyback, doesn’t create any connections to the server. Instead, it waits until another DWR service call occurs and piggybacks pending events onto this request’s response. This makes it highly efficient but means that client notification of events is delayed until the client makes an unrelated call. The final mechanism uses long-lived, Comet-style connections. And best of all, DWR can auto-detect when it’s running under Jetty and switch to using Continuations for nonblocking Comet.

I’ll adapt my GPS example to use Reverse Ajax with DWR 2. On the way, you’ll see in more detail how Reverse Ajax works.

I no longer need my servlet. DWR provides a controller servlet that mediates client requests directly onto Java objects. I also no longer need to deal explicitly with Continuations because DWR takes care of this under the hood. So I simply need a newCoordListener implementation that publishes coordinate updates to any client browsers.

An interface called ServerContext provides DWR’s Reverse Ajax magic. ServerContext is aware of all Web clients currently viewing a given page and can provide a ScriptSession to talk to each. This ScriptSession is used to push JavaScript fragments to the client from Java code. Listing 10 shows how the ReverseAjaxTracker responds to coordinate notifications, using them to generate calls to the client-side updateCoordinate() function. Note that the appendData() call on the DWRScriptBuffer object automatically marshalls a Java object to JSON, if a suitable converter is available.
Listing 10. The notification callback method in ReverseAjaxTracker

                
public void onCoord(GpsCoord gpsCoord) {

  // Generate JavaScript code to call client-side
  // function with coord data
  ScriptBuffer script = new ScriptBuffer();
  script.appendScript("updateCoordinate(")
    .appendData(gpsCoord)
    .appendScript(");");

  // Push script out to clients viewing the page
  Collection<ScriptSession> sessions = 
            sctx.getScriptSessionsByPage(pageUrl);

  for (ScriptSession session : sessions) {
    session.addScript(script);
  }   
}

 

Next, DWR must be configured to know about ReverseAjaxTracker. In a larger application, DWR’s Spring integration could be leveraged to provide DWR with Spring-created beans. Here, however, I’ll just have DWR create a new instance ofReverseAjaxTracker and place it in the application scope. All subsequent DWR requests will then access this single instance.

I also need to tell DWR how to marshall data from GpsCoord beans into JSON. Because GpsCoord is a simple object, DWR’s reflection-based BeanConverter is sufficient. Listing 11 shows the configuration for ReverseAjaxTracker:
Listing 11. DWR configuration for ReverseAjaxTracker

                
<dwr>
   <allow>
      <create creator="new" javascript="Tracker" scope="application">
         <param name="class" value="developerworks.jetty6.gpstracker.ReverseAjaxTracker"/>
      </create>

      <convert converter="bean" match="developerworks.jetty6.gpstracker.GpsCoord"/>
   </allow>
</dwr>

 

The create element’s javascript attribute specifies a name that DWR uses to expose the tracker as a JavaScript object. However, in this case, my client-side code won’t use it, instead having data pushed to it from the tracker. Also, some extra configuration in web.xml is needed to configure DWR for Reverse Ajax, as shown in Listing 12:
Listing 12. web.xml configuration for DwrServlet

                
<servlet>
   <servlet-name>dwr-invoker</servlet-name>
   <servlet-class>
      org.directwebremoting.servlet.DwrServlet
   </servlet-class>
   <init-param>
      <param-name>activeReverseAjaxEnabled</param-name>
      <param-value>true</param-value>
   </init-param>
   <init-param>
      <param-name>initApplicationScopeCreatorsAtStartup</param-name>
      <param-value>true</param-value>
   </init-param>
   <load-on-startup>1</load-on-startup>
</servlet>

 

The first servlet init-paramactiveReverseAjaxEnabled, activates polling and Comet functionality. The second,initApplicationScopeCreatorsAtStartup, tells DWR to initialize the ReverseAjaxTracker at application startup time. This overrides the usual behaviour of lazy initialization when the first request on a bean is made — necessary in this case, because the client never does actively call a method on the ReverseAjaxTracker.

Finally, I need to implement the client-side JavaScript function invoked from DWR. The callback — updateCoordinate() — is passed a JSON representation of a GpsCoord Java bean, auto-serialized by DWR’s BeanConverter. The function just extracts the longitude and latitude fields from the coordinate and appends them to a list via Document Object Model (DOM) calls. This is shown in Listing 13, along with my page’s onload function. The onload contains a call todwr.engine.setActiveReverseAjax(true), which tells DWR to open a persistent connection to the server and await callbacks.
Listing 13. Client-side implementation of trivial Reverse Ajax GPS tracker

                
window.onload = function() {
  dwr.engine.setActiveReverseAjax(true);
}

function updateCoordinate(coord) {
  if (coord) {
    var li = document.createElement("li");
    li.appendChild(document.createTextNode(
            coord.longitude + ", " + coord.latitude)
    );
    document.getElementById("coords").appendChild(li);
  }
}

 

Updating the page without JavaScript

If you want to minimize the amount of JavaScript code in your application, there’s an alternative to writing out JavaScript callbacks with ScriptSession: You can wrap ScriptSession instances in a DWR Util object. This class provides simple Java methods for manipulating the browser DOM directly, and it auto-generates the necessary script behind the scenes.

Now I can point my browser to the tracker page, and DWR will begin pushing coordinate data to the client as it is generated. This implementation simply outputs a list of the generated coordinates, as shown in Figure 2:
Figure 2. Output of the ReverseAjaxTracker
Simple Web page listing generated coordinates

That’s how simple it is to create an event-driven Ajax application using Reverse Ajax. And remember, thanks to DWR’s exploitation of Jetty Continuations, no threads are tied up on the server while the client is waiting for a new event to arrive.

From here, it’s easy to integrate a map widget from the likes of Yahoo! or Google. By changing the client-side callback, coordinates can simply be passed to the map API, instead of appended directly onto the page. Figure 3 shows the DWR Reverse Ajax GPS tracker plotting the random walk on such a mapping component:
Figure 3. ReverseAjaxTracker with a map UI
Map showing path tracing generated coordinates

Back to top

Conclusions

You’ve now seen how Jetty Continuations combined with Comet can provide an efficient, scalable solution for event-driven Ajax applications. I haven’t given any figures for the scalability of Continuations because performance in a real-world application depends on so many variables. Server hardware, choice of operating system, JVM implementation, Jetty configuration, and indeed your Web application’s design and traffic profile all affect the performance of Jetty’s Continuations under load. However, Greg Wilkins of Webtide (the main Jetty developers) has published a white paper on Jetty 6 that compares the performance of a Comet application with and without Continuations, handling 10,000 concurrent requests (see Resources). In Greg’s tests, using Continuations cuts thread consumption, and concomitantly stack memory consumption, by a factor of more than 10.

You’ve also seen how easy it is to implement an event-driven Ajax application using DWR’s Reverse Ajax technology. Not only does DWR save you much client- and server-side coding, but Reverse Ajax also abstracts the whole server-push mechanism away from your code. You can switch freely among the Comet, polling, or even piggyback methods, simply by altering DWR’s configuration. You’re free to experiment and find the best-performing strategy for your application, without any impact on your code.

If you’d like to experiment with your own Reverse Ajax applications, a great way to learn more is to download and examine the code of the DWR demos (part of the DWR source-code distribution, see Resources). The sample code used in this article is also available (see Download) if you’d like to run the examples for yourself.

 

Back to top

Download

Description Name Size Download method
Sample code jetty-dwr-comet-src.tgz 8KB HTTP

Information about download methods

 

Resources

Learn

Get products and technologies

  • Jetty: Download Jetty.
  • DWR: Download DWR.

Discuss

About the author

Philip McCarthy is a London-based software-development consultant specializing in Java and Web technologies. Past work includes projects for Orange and Hewlett Packard Labs. His current focus is Web-based financial systems built with open source frameworks.

original:http://www.ibm.com/developerworks/java/library/j-jettydwr/index.html